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Cross^-Reference to Related Applications 
This application claims the benefit of the filing date of U.S. application 
5 Serial No. 10/702,228, filed on November 5, 2003 and U.S. plication Serial 
No. 10/678,961, filed on October 3, 2003, the disclosures of which are 
incorporated by reference herein. 

Background of the Invention 

10 Molecular biotechnology has revolutionized the production of protein 

compounds of pharmacological importance. The advent of recombinant DNA 
technology permitted for the first time the production of proteins on a large scale 
in a recombinant host cell rather flian by the laborious and expensive isolation of 
the protein 6om cells or tissues which may contain minute quantities of that 

15 protem. The production of proteins, including human proteins, on a large scale in 
a host requires the ability to express the protein of interest in a host cell, e.g., a 
heterologous host cell. This process typically involves isolation or cloning of the 
gene encoding the protein of interest followed by transfer of the coding region 
(open reading firame) into an expression vector which contains elements (e.g., 

20 promoters) which direct the expression of the desired protein in the host cell. 
The most commonly used means of transferring or subcloning a coding region 
into an expression vector involves the in vitro use of restriction endonucleases 
and DNA ligases. Restriction endonucleases are enzymes which generally 
recognize and cleave a specific DNA sequence in a double-strand DNA 

25 molecule. Restriction enzymes are used to excise a DNA firagment which 
includes a coding region of interest fi-om the cloning vector and the excised 
DNA firagment is then joined using DNA ligase to a suitably cleaved vector with 
transcription regulatory sequences in such a manner that a fimctional protein can 
be expressed when the resulting expression vector is introduced to a cell or an in 

30 vitro transcription/translation mixture. 

A problem in controlling fi'agment orientation in firagments generated by 
restriction enzymes is that many of the commonly used restriction enzymes 
produce termini that are rotationally equivalent, and therefore, self-ligation of 
DNA fragments with such termini is random with regard to firagment orientation. 
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Hartley and Oregon (Ggxe, 13:347 (1981)) reported a technique to control 
fiagment orientation during ligation, whidi required the introduction of ^val 
sites flanking either end of the cloned fragment (also see Hartley and CJregori 
US. Patent No. 4,403,036). Since^,^ cleavage produces distinguishable enlis 

self-hgation of the fragment results in a strong bias toward head-to-taa 
orientation. This is so because head-to-head and tail-to-tail ligation results in 
base mismatches. The polymerized molecules were then inserted into a vector 
and used to transfonn E. colL 

In a similar approach, Ikeda et al. (2^ 21 : 1 9 (1988)) produced head- 
10 to-tailtandemarraysofaDNAfragmentencodingahumanmajor 

Mstocompatibilityantigenthatwasflankedbyi^ cleavage sites, ^produces 
smgle-strandDNA overhangs thatarenotrotationaUy equivalent. ^^I sites have 
also been u^ to produce copolymers of gene expression cassettes and selection 
markers, which can be used to transfect cells (Monaco et al., Biotedmol^ 
Bio^ 2Q:157 (1994); Asselbergs et al., .And^Bioch^ 243:285 (1996)) ' 

Monaco etal. treated the copolymer withi^^rltocleavetheDNA at the3- end of 
the selectable marker gene. In this way. transfected DNA molecules contain 
only one selectable marker gene per copolymer. 

Class ns restriction enzymes can generate totally asymmetric sites and 
complementary cohesive ends. Kim and Szybalski (0^71:1 (1988)) 

mfroduced sites for 5^MI.aclass IIS restriction enzyme, at either endofcb^^ 
DNA. Self-Ugation of the cloned DNA provided multimers comprising repeat 
umts m the same orientatioa Similarly. Takeshita et al. (G^ 71-9 (1988)) 
achieved tandem, gene amplification by inserting a fragment encoding human 
Piotem C mto aplasmid to introduce asymmetric cohesive ends into the 
fi^gment. ^ this case, sites.for the class ns enzyme, 5./XI, were used. Tho 

multimerwas then cloned intoacosmid vector comprisinga„.o gene, packaged 
mto lambdaphageparticles. and amplified in£. coli. ITie cosmid vectors were 
then mtroduced into Chinese hamster ovary DHFR^ells. which were treated 
with G418 to select for cells that expressed the neo gene. Takeshita et al also 
found that cells expressed human protein C, albeit at lower levels. foUowing 
transfection with unpackaged tandem ligated DNA comprising copies of the 
cosmid vector and the human protein C gene. 
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A similar ^proach was described by Lee et al. (Genetic Analysis: 
Biomolecular Engineering. 13:139 (1996)), who amplified target DNA as 
tandem multimers by cloning the target DNA into a class IIS restriction enzyme 
cleavage site of a vector, excising a monomeric insert with the class IIS 
5 restriction enzyme, isolating monomeric inserts, self-ligating the inserts, and 
cloning the multimers into a vector. According to Lee et aL, such a method is 
useful for polymerizing short DNA fragments for the mass production of 
peptides. 

Another approach for forcing directional ligation is to devise synthetic 

10 linkers or adapters that are used to create asymmetric cohesive ends. For 
exanq)le, Taylor and Hagerman (Geno. 53:139 (1987)) modified the Hartley- 
Gregori approach by attaching synthetic directional adapters to a DNA fragment 
in order to establish control over fragment orientation during ligation. Following 
polymerization, the multimers were ligated to a linearized vector suitable for E. 

1 5 coli transformation. Stahl et slI fGene. 89: 1 87 (1 990)) described a similar 

method for polymerizing DNA fragments in a head-to-tail arrangement. Here, 
synthetic oligonucleotides were designed to encode an epitope-bearing peptide 
with 5*-protruding ends complementary to the asymmetric cleavage site of the 
class nS restriction enzyme, BspML. After polymerization, the peptide encoding 

20 fragments were inserted into the unique BspMl site cleavage site of a vector, 
which was used to transform colt Clones were screened using the 
polymerase chain reaction, and then subcloned into prokaryotic expression 
vectors for production of the peptides in E, coli. 

Nevertheless, the ability to transfer a desired coding region to a vector 

25 with transcription regulatory sequences is often limited by the availability or 
suitability of restriction enzyme recognition sites. Often multiple restriction 
^izymes must be employed for the removal of the desired coding region and the 
reaction conditions used for each enzyme may differ such that it is necessary to 
perform the excision reactions in separate steps. In addition, it may.be necessary 

30 to remove a particular enzyme used in an initial restriction enzyme reaction prior 
to completing remaining restriction enzyme digestions. This requires a time- 
consuming purification of the subcloning intermediate. It also may be necessary 
to inactivate restriction enzymes prior to ligation. 
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Methods for the directional transfer of a target DNA molecule from one 
vector to another in vitro or in vivo without the need to rely upon restriction 
enzyme digestions have been described. For example, the Creator™ DNA 
cloning kit (Clontech Laboratories, Inc.) uses Cre-/oxP site-specific 
5 recombination to catalyze the transfer of a target gene from a donor vector to an 
acceptor vector, which is a plasmid containing regulatory elements of the desired 
host expression system (see also U.S. Patent No. 5,851,808). Cre, a 38-kDa 
recombinase protein from bacteriophage PI, mediates recombination between or 
within DNA sequences at specific locations called loxP sites (Sauer, 

10 Biotechniaues> 16:1086 (1994); Abremsld et al., LBioLChem.. 259:1509 

(1984)). These sites consist of two 13 bp inverted repeats separated by an 8 bp 
spac^ region that provides directionality to the recombination reaction. The 8 
bp spacer region in the loxP site has a defined orientation which forces the target 
gene to be transferred in a fixed orientation and reading fimie. Donor vectors in 

15 the kit contain two loxP sites, which flank the 5' end of a multiple cloning site 
(MCS) and the 5' end of the open reading frame for the chloramphenicol 
resistance gene. Donor vectors also contain the ampicillin gene for propagation 
and selection in E. coliy and the sucrase gene from B. subtilis {SacB) for selection 
of correct recombinants. Acceptor vectors in the kit contain a single loxP site, 

20 followed by a bacterial promoter, which drives expression of the 

chloramphenicol marker after Cre-fox-mediated recombination. The gene of 
interest, once transferred, becomes linked to the specific expression elements for 
which the acceptor vector was designed. If the coding sequence for the gene of 
interest is in frame with the upstream loxP site ui the donor vector, it is in fi^e 

25 with all peptides in the accq)tor vector. 

The Gateway™ Cloning System uses phage lambda-based site-specific 
recombination. The LR Reaction is a recombination reaction between an entry 
clone having mutant attL sites and a vector (a Destination Vector, pDEST'f 
havmg the corresponding mutant attR site, mediated by a cocktail of 

30 recombination proteins {X recombination proteins Int, Xis, and the E. coli- 
encoded protein IHF), to create an expression clone. The BP Reaction is a 
recombination reaction between an expression clone (or an a/^B-flanked PCR 
product) and a donor vector to create an entry clone. The BP reaction permits 
rapid, directional cloning of PGR products synthesized with primers containing 
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terminal 25 bp attB sites (+4 Gs). The result is an entry clone containing the 
PGR fragment Similarly, DNA segments flanked by attB sites in an expression 
clone can be transferred to generate entry clones which can be used to move the 
sequence of interest to one or more destination vectors in parallel reactions to 

5 generate expression clones. The resultant 25 bp attB sites (attBl on the left (N- 
terminus) and attBl on the right (C-terminus)) created by the LR reaction are 
derived from the attL sites (adjacent to the gene), whereas the distal sequences 
are derived from the attR sites. 

However, the protein encoded by Cve-lox? based expression vectors or 

10 other site-specific recombinase based vectors, e.g., the Gateway™ Cloning 
System, has numerous, for instance, 8 to 13, amino acid residues at the N- 
terminus and C-tenninus of the protein, which residues are encoded by the site- 
specific recombination exchange sites. 

Thus, what is needed is an improved method to directionally clone a 

15 nucleic acid sequence of interest. 



Summary of the Invention 

The invention provides methods and vectors for use in directional 
cloning. In one embodiment, a vector comprising an open reading frame of 

20 interest (a donor vector) comprises at least two restriction enzyme recognition 
sites ("restriction enzyme sites", "restriction sites" or "recognition sites") 
flanking the open reading frame (DNA sequence of interest), wherein at least 
one of the flanking sites is a site for a first restriction enzyme which generates 
hapaxotenninistic ends, e.g., a restriction enzyme with a degenerate recognition 

25 sequence or one which cleaves outside of a recognition sequence yielding single- 
strand ends, and other vector sequences (backbone sequences) for replication 
and/or maintenance of the vector in a host cell and, optionally, one or more 
detectable, e.g., selectable, marker genes. In one embodiment, a donor vector 
comprises at least two restriction enzyme sites flanking the open reading frame, 

30 wherein at least one of the flanking sites is for a first restriction enzyme which is 
a hapaxotenninistic restriction enzyme, e.g., a restriction enzyme with a 
degenerate recognition sequence, which site, once cleaved, does not result in self 
complementary single-strand DNA overhangs or blunt ends, i.e., the ends are 
non-self complementary single-strand DNA overhangs. In another embodiment. 
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the donor vector comprises at least two restriction enzyme sites flanking the 
open reading frame, wherein at least two of the flanking sites are for a first 
restriction enzyme with a hapaxomeric recognition sequence, and optionally for 
the same restriction enzyme, which sites, once cleaved, yield a linear DNA 
5 firagment which does not have self-complementary single-strand DNA overhangs 
or blimt ends. Such a vector may be employed as a source of the open reading 
firame to prepare a vector for expression of the linked open reading firame (a 
recipient or expression vector). The backbone sequences in the recipient vector 
are generally provided by an acceptor vector which contains transcriptional 

1 0 regulatory sequences and optionally sequences for the production of fusion 
proteins. The acceptor vector also comprises non-essential DNA sequences 
flanked by at least two restriction enzyme sites for a second restriction enzyme 
with a hapaxomeric recognition sequence, and optionally one or more detectable, 
e.g., selectable, marker genes. In one embodiment, the two flanking restriction 

1 5 enzyme sites in the acceptor vector for the second restriction enzyme are sites 
which, once cleaved, do not result in self complementary single-strand DNA 
overhangs or blunt ends but yield a linear DNA fragment having single-strand 
DNA overhangs that are complementary with one of the two DNA overhangs 
generated by tiie first restriction enzyme. Once the linearized DNA fragments 

20 are ligated to form a recipient vector, the recipient vector may be introduced to 
cells, e.g., prokaryotic cells such as E. colt cells, insect cells, plant cells, 
mammalian cells, or lysates thereof or to in vitro transcription/translation 
mixtures, so as to yield a transformed cell that expresses a protein encoded at 
least in part by the open reading frame. 

25 In one embodiment, the invention provides a method for the directional 

subcloning of DNA fragments. The method includes providing a first vector 
comprising a first selectable marker gene and a DNA sequence of interest, which 
DNA sequence of interest is flanked by at least two restriction enzymes sites, 
wherein at least two of the flanking restriction enzyme sites are sites for a first 

30 restriction enzyme which is a lu^axoterministic restriction enzyme, and wherein 
digestion of the first vector with the first restriction enzyme generates a first 
linear DNA fragment which lacks the first selectable marker gene but comprises 
the DNA sequence of interest and a furst pair of non-self complementary single- 
strand DNA overhangs. A second vector for flie method is provided which 
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includes a second selectable marker gene which is distinguishable from the first 
selectable marker gene and non-essential DNA sequences, optionally including a 
counterselectable gene, which non-essential DNA sequences are flanked by at 
least two restriction enzyme sites, wherein at least two of the flanking restriction 

S enzyme sites are for a second restriction enzyme which is a hapaxoterministic 
restriction enzyme, wherein digestion of the second vector with the second 
restriction enzyme generates a second linear DNA fragment which lacks the non- 
essential DNA sequences but comprises the second selectable marker gene and a 
second pair of non-self complementary single-stranded DNA oveifaangs, and 

10 wherein each of the second pair of non-self complementary single-strand DNA 
overhang is complementary to only one of the single-strand DNA overhangs of 
the first pair of non-self complementary single-strand DNA overhangs and 
pemiits the oriented joining of the first linear DNA fragment to the second linear 
DNA fragment. The first and second vectors, the first vector and the second 

1 5 linear DNA fragment, or the second vector and tiie first linear DNA fragment are 
combined in a suitable buffer with one or more of the restriction enzymes which 
are hapaxoterministic restriction enzymes and optionally DNA ligase under 
conditions effective to result in digestion and optionally ligation to yield a 
mixture optionally comprising a third vector comprising the first and second 

20 linear DNA molecules which are joined in an oriented maimer via the first and 
second pairs of non-self-complementary single-strand DNA overhangs. In one 
embodiment, ligase is added simultaneously with the one or more restriction 
enzymes, while in another embodiment, ligase is added subsequent to the one or 
more restriction enzymes. Optionally, the mixture is introduced into a host cell, 

25 and optionally the transformed host cells are selected for the expression of 

second selectable marker gene or against the expression of the counterselectable 
gene. The method may also include identifying a third vector in which the DNA 
sequence of interest has been transferred in an oriented manner to the second 
linear DNA fragment In one embodiment, the first restriction enzyme is Sfilf 

30 Sapl or an isoscMzomer thereof. In one embodiment, the first restriction enzyme 
is or an isoschizomer thereof and the second restriction enzyme is Bgll or an 
isoschizomer thereof. In one embodiment, the second restriction enssyme is Earl 
or an isoschizomer frxereof. Jn another embodiment, the first and second 
restriction enzymes are the same. Optionally, the DNA sequence of interest 
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compri«s an op» reading tee compmbg one or nor. sites fcr the ftst or 
^ restnclion enzjme. In tlUa eoAodiment, optionally, priorto dig«tion 

wththeon.ormo,.rest,iclionenz,n«,th.sit.s«,rth.on=ormMe,.striotion 
•^>™=»i"meop«„eadingtoea«prot««dsoastop««,di^„^eg 
I«..ectedbynrtyUuions„ch.s«ithff.dnmethylase.&pIn^ytase.or«n' 
methylase. Aten^vely, prior ..methyMon, the flanldng sites ftr the to, „ 
second testriction enzj™ at, eontactod wift an oligonnobotido complementary 
to the ftamdngreariction enzyme siteandR«>A. B. one emb«toenUlgatton 
^^'"^iotaingyicI.l.athirdvectorencoding.N.taminal 

wtaoh is eneoded by the DNA s.,n«>ce of inters, and nucleic acid seqnences 5. 
lothe3'«>dofthea«»ndlinea-DNAfta8ment In «iother «Bbodiment. 

l.8atKmandorientedjmningyieldsa.innJ vector encodingaC-ternJnalW^^ 
I»otein ,rtieh is encod«i by the DNA sequence of interest and nucleic acid 

•~3'<ott.5'endoftas«ondlinearDNAfi.9nen..tayet,nother 

embodimeta. Bgati» and ori««ed joining yields a thiri vector encoding a 

Ptotem which is encoded by the DNA sequence of interest and nucleic add 
»?°»c« 5' and 3' to the respective 3' and 5' end of the second linear DNA ' 

'■''W»-«'«'i«n'.U8a,ionand Oriented joiningyield..,bW 
'~'<''««=«ii"ga<usionprotdnwhichisencodedby,heDNAse,nenceof 
.0 ■»'««t"ndthe«changesite(s)createdbytheorientedjoining. 

Thus, the invention also provides a vector sj«em for clonmg In one 

embodiment the syst^includesatotvectorcomprisingaselectablem-te 
Sene «rf . DNA sequence of interest, which DNA sequence of interest is 
flanked by at least two restriction enzyme sites, wherein at least two of the 
flankmg restriction enzyme sites are fcr a fct restriction enzyme which is a 
hspaxotemnnistic restriction enzyme, wherein digestion of the ftst vector with 
the first restriction enzyme generates a first linear DNA flngment which does not 
comprise the firs, selectable marker gen. bu, comprises the DNA sequence of 

"«="^-«lafi-.pairof„on-s.lfcomplementatysinglMtrandDNAoverinmgs 
wheremthefirstrcstrictionenzymesitesaredesignedsuchthatthetotlinear ' 
DNAaagmentcanbereligateddirecUyto.secondv««,r. n.esyrtem 
optionally includes a second vector, which includes a second selectable mrtter 

g»e which is distinguishablefrerndteii^selectablemaHcer^rinon^., 
DNA sequences, optionanyinctodingacoun.en,electrt,leg«.e.whichno.. 
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essential DNA sequences are flanked by at least two restriction enzyme sites, 
wherein two or more of the flanking restriction enzyme sites in the second vector 
are for a second restriction enzyme which is a hapaxoterministic restriction 
enzyme> wherein digestion of the second vector with the second restriction 
S enzyme generates a second linear DNA fragment which lacks the non-essential 
DNA sequences but comprises the second selectable marker gene and a second 
pair of non-self complementary single-strand DNA overhangs, wherein each of 
the second pair of non-self compl^entary single-strand DNA overhangs is 
complementary to only one of the single-strand DNA overhangs of the first pair 
10 of non-self complementary single-strand DNA overhangs and permits the 
oriented joining of the first linear DNA Augment to the second linear DNA 
fragment. Further provided is a kit which includes one or more vectors of the 
vector system. 

Also provided is a method for producing a vector suitable for expression 

15 of an amino acid sequence of interest. The method includes combining at least 
two vectors in a suitable buffer with one or more restriction enzymes and 
optionally DNA ligase under conditions effective to result in digestion and 
optionally ligation to yield a mixture optionally comprising a third vector. A 
first vector for use in the method includes a first selectable marker gene and a 

20 DNA sequence of interest, which DNA sequence of interest is flanked by at least 
two restriction enzyme sites, wherein two or more of the flanking restriction 
enzyme sites are sites for a first restriction enzyme which is a h^axoterministic 
restriction enzyme, wherein digestion of the first vector with the first restriction 
enzyme generates a first linear DNA fragment which lacks the first selectable 

25 marker gene but comprises the DNA sequence of interest and a first pair non-self 
complementary suigle-strand DNA overhangs. A second vector comprises a 
second selectable marker gene which is distinguishable from the first selectable 
marker gene and non-essential DNA sequences that optionally include a 
comiterselectable gene, which non-ess^tial DNA sequences are flanked by two 

30 or more restriction enzyme sites, wherein two or more of the flanking sites in the 
second vector are for a second restriction enzyme which is a hapaxoterministic 
restriction enzyme. Digestion of the second vector with the second restriction 
enzyme generates a second linear DNA fi:agment which lacks non-essential 
DNA sequences but comprises the second selectable marker gene and a second 
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pair of non-self complementary single-strand DNA overhangs, wherein each of 
the second pair of the non-self-con^lanentaiy DNA ovetfaangs is 
complem^itary to only one of the single-strand DNA overhangs of the first pair 
of non-self complementary single-strand DNA overhangs, and permits the 
5 oriented joining of the first hnear DNA fiagment to flie second linear DNA 
fisgment In one embodiment, the DNA sequence of interest encodes one or 
more domains of one or more proteins. 

In one embodimoit, at least one restriction enzyme site flanking the open 
reading firame of interest is for a restriction enzyme that recognizes an internal 
10 palindrome, e.g., a type H enzyme such asS^or BglU includmg but not limited 
to restriction enzymes lhat genorate more than two types of staggered ends 
(DNA overhangs) due to the ambiguity in base recognition, for instance, Ahdi, 
Ahvm, ApaBl BgR, BlpU BstAPl, Bstm, BstXL, BsuSSl, DraJL, Dram, Drdl, 
Eaml 1051, Ecom, Pfim, Pssl Saul, Sfil Xcml, as well as isoschizomers 
1 5 thereof, but not restriction enzymes that generate blunt ends, hi another 

embodiment, at least one restriction enzyme site flanking the open reading fiame 
of interest is for a type HS enzyme, e,g., Sapl or Earl, such as restriction 
enzymes that generate ends outside of their recognition sites including but not 
Umited to Aarl, AcelR, Alol, Bael, BbrTl, Bbvl, BbvU, Bed, Bce83l, BceAI, 
20 BceQ, Bcgl, BcNl, Bfil, Binl, BpH, BsaXl, BscM, BseMU, BseRl, Bsgl Bsml, 
BsmM, BsmFl, BsplAl, BspCNl, BspMl, BsrI, BsrDl, BstFSl, BtgZl, Btsl, Cjel, 
CJePl EcO, Eco3U, EcoSll, EcoSTMl, EspSl, FaO, Paul, Fokl, Gsul, HaeTV, 
Hgal, Hin4l, Hphl, HpyAV, Ksp632l (Earl), MboU, Mlyl, Mmel, Mnll, Ple\ 
PpiX Psrl, RleAI, Sapl, VapK32l, SfaNl, SspDSl, Sthl32l, Stfl, TaqU, TspUTl, 
25 TspGm, Tspm, mi 1 m, as well as isoschizomers thereof. In a fiirther 

embodiment, one of the restriction enzymes is a class ns restriction enzyme, 
including but not limited to AccBTl, AcelO, AclWl, Adel, AhdV, AIW161, Alwl, 
Abm, ApaBl, Asp^l, Aspl, AsuHPl, Bbsl, Bbvl, BbvE, Bcei3l, BceO, BcNl, 
Bfil BgH Binl, BmrU Bptl, Bpml, BpuM, Bsal, Bse3Dl, Bse4l, BseGl, BseU, 
30 BseRl Bsgl BsO, BsmM, BsmBl BsmPl, BspML, BsrDl BsfJll BstAPl 
Bsmi BsfXl Bsu6l Dram, Drdl DseUl Eamimi Eaml 1051, Earl 
EchHKl Eco31l EcoSTl, EcoNl il396I, Esp3l Fold, Foul Gsul Hgal Hphl 
MboH MsiYl Mwol NruGl PflMl PflPl, Plel Sfam, TspKl Ksp632l Mmel 
RleAI Sapl Sfil, TaqH mi 1 II. mi 1 IH, Van91l Xagl Xcml or a restriction 

10 
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enzyme which has the same recognition site as AccSll, AceUl, AclWl^ Adel, 
Ahdl, Alw26l, Alwl, Alwm, ApdBl, AspEl, AspU AsuHPl, Bbsl, Bbvl, BbvH, 
BceSSl, Bcea, BciWl, Bfil, BglU Bin!, Bmrl, Bpil, Bpml, BpuM, BsdU BseSDl, 
Bse4i^ BseGU BseU, BselCi, Bsg\ BsH, BsmAl, BstnBl, BstiiFl, BspMl, BsrDI, 

5 Bsmi BstAPI, BstFSI, BsiXI, Bsu61, DraTO, Drdl, DseDI, EbwI 1041, 

EamllOSl, Earl^Echmi, Eco3ll EcoSTi, Ecom, il396I, E5p3I, Fold, Paul, 
Gsul Hgal, HphI, MboU, MsiYl, Mwol, NruGI, PflMl PflPl PleU Sfdm, 
TspRI, Kspeni, Mmel, RleAl, Sapl, Sfli, TaqH, milll. miim, Van91l 
Xagl, Xcml, or is Aval, AmaSTl, Bcol, BsdBl, EcoSSl, AvaU, Eco47l, BmelSJ, 

10 HgiBl Sinl, BanU AccBU, BsKm, Eeo6Al, Bfinl, BstSYl, SfcX Bpu\Ol, BsaML, 
BscCl Bsml, Mval269l, BshUSSl, BsaOl, BsiEl, BstMCl, Bsell, BseW, Bsrl, 
Cfrm, Bsil, BssSl Bsam, BsiZl, AspS9l, Cfr\3l Sau961, BspMlOl, Blpl, 
Bpuimi, Cem, BstACl, BstHm, Ddel, Cpol, Cspl, RsrU, Dsal, BstDSl, 
EcolM, BanO, £coT38I, FnOI, Hgim, £col30I, Styi, BssTll, EcoTUl, Erhl, 

15 Espl, Blpl, Bpulim, BspXllQl, CeM, HgiAI, BsiHKAL, AlwllI, AspBI, 
BbvUl, HinfL, PspPPI, P;jmMI, PspSJi, SanDl, Sdul, BspUm, Bmyl, Seel, 
BsaJU BseDl, Sfcl, Bfinl, BstSFl, SmU, or a restriction enzyme whiich has the 
same recognition site aSi4vaI, AmaiTl, Bcol, BsoBl, £co88I, ^vall, EcoAll, 
BmelSl HgBI, isinl Banl, AccBlI, BshNU Eco64l, Bfinl, BstSFl, Sfcl BpulOl, 

20 Bsdm, BscCl, BsmU Mvfll269I, BshlTSSl, BsaOl, BsiEU BsMCl, Bsell, 
Bsem, Bsrl, CfrlOl, Bsil, BssSl BstZEl, BsiZi, AspS9\ CfrUl, ScaSO, 
Bspimi Blpl, Bpul 1021, Ceia, BstACl, BsfDBl, Ddel, Cpol, Cspl, RsrO, Dsal, 
BsfDSU EcoTAU BanE, EcoTiBl, FriOl, HgiSTl, Ecol301, Styl, BssTll EcoTl4U 
Erhl Espl, BlpU Bpul 1021, Bj;>1720I, CeM, HgiM, B^iHKAI, ^/w211, AspHi, 

25 56vl2I, Hinfl, PspPVl, PpuMI, PspSH, SanDl, Sdul Bspl2Ml Bmyl Seel, 
BsaSl BseDl Sfcl, Bfinl BstSFl SmU. In one embodiment, one of the 
restriction enzymes is Aarl, Ascl BbrCl Cspl, Dral Fsel Notl Nrul Pacl, 
Pmel, Pvul Sapl, Sdal, Sfil, Sgfl SpK, Srfl, Swal, or a restriction enzyme that 
has the same recognition site as Aarl, Ascl, BbrCl, Cspl Dral, Fsel Notl Nrul 

30 Pad, Pmcl, PvmI, Sapl, Sdal Sfil Sgfl Spa, Srfi, Swal. 

In another embodiment, the invention provides a donor vector 
comprising an open reading frame of interest flanked by at least two restriction 
enzyme sites, one of which flanking sites is for a first restriction enzyme that has 
a low frequency, e,g., fewer than about 25%, for instance, including fewer than 
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about 20%, 10%, 5% or even fewer, e.g., about 1%, of recognition sites in a 
plurality of, for instance, 3 or more, including 100, 1,000, 10,000 or more, 
cDNAs or open reading frames for a particular species (an "infrequent cutter") 
and generates single-strand DNA overhangs, and the other of which flanking 
S sites is for a second restriction enzyme that has a low frequency of recognition 
sites in a plurality of cDNAs or open reading frames for a particular species, for 
instance, the same species as for the first restriction enzyme, and generates ends 
that are not complementary to the oveifaangs generated by flie first restriction 
enzyme. In one embodim^t, the second restriction enzyme generates blunt ends 

10 (a "blunt cuttef The frequency of a particular restriction enzyme recognition 
site in one or more nucleic acid molecules can be determined by methods well- 
known to the art. For instance, databases with a plurality of cDNA sequences or 
open reading fi:ames for a particular organism may be employed to determine 
such a frequency. A donor vector of the invention may be employed as a source 

15 of the open reading frame of interest to prepare a recipient vector of the 
invention. The backbone sequences in the recipient vector are generally 
provided by an acceptor vector having transcriptional regulatory sequences of 
interest and optionally sequences for the production of fusion protems. The 
acceptor vector also comprises non-essential DNA sequences flanked by at least 

20 two restriction enzyme sites, and one or more detectable marker genes. In one 
embodiment, one of the flanking sites in the acceptor vector is for a third 
restriction enzyme which generates single-strand DNA overhangs, which single- 
strand DNA overhangs are complementary with the single-strand DNA 
overhangs produced when the donor vector is digested with the first restriction 

25 enzyme. The other flanking site in the acceptor vector is for a fourth restriction 
enzyme which generates ends that are not complementary to the ends generated 
by the first or third restriction enzyme but are compatible, i.e., can be ligated to, 
with ends generated by the second restriction enzyme. In one embodiment, the 
second and fourth recognition enzymes are blunt cutters and the restriction sites 

30 for the second and fourth restriction enzymes are not recognized by the same 
restriction enzyme. In one embodiment, the open reading frame encodes one or 
more domains of one or more protems. 

Thus, the invention provides a method for the directional subcloning of 
DNA fragments. The method includes providing a first vector comprismg a first 
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selectable marker gene and a DNA sequence of interest, which DNA sequence of 
interest is flanked by at least two restriction enzyme sites, wherein at least one of 
the flanking restriction enzyme sites is a site for a first restriction enzyme which 
has infrequent restriction sites in cDNAs or open reading frames from at least 
5 one species and generates complementary single-strand DNA overhangs, 
wherein at least one of the flanking restriction enzyme sites is for a second 
restriction enzyme which has infrequent restriction sites in cDNAs or open 
reading frames from at least one species and generates ends that are not 
complementary to the overhangs generated by the first restriction enzyme, 

10 wherein digestion of the first vector with the first restriction enzyme and the 

second restriction enzyme site generates a first linear DNA fragment which lacks 
the first selectable marker gene but comprises the DNA sequence of interest 
Also provided is a second vector comprising a second selectable marker gene 
which is distinguishable from the first selectable marker gene and non-essential 

1 5 DNA sequences, optionally including a counterselectable gene, which non- 
essential sequences are flanked by at least two restriction enzymes sites, wherein 
at least one of the flanking restriction enzyme sites in the second vector is for a 
third restriction enzyme which generates complementary single-strand DNA 
overhangs that are complementary to the smgle-strand DNA overhang generated 

20 by the first restriction enzyme in the first linear DNA firagment, wherein at least 
one of the flanking restriction sites in the second vector is for a fourth restriction 
enzyme which generates ends that are not complementary to the ends generated 
by the first or third restriction enzyme but can be ligated to the ends generated by 
the second restriction enzyme, and wherein digestion of the second vector with 

25 the third restriction enzyme and the fourth restriction enzyme generates a second 
linear DNA fiagment which lacks non-essential DNA sequences but comprises 
the second selectable marker, which second linear DNA fragment is flanked by 
ends which permit the oriented joining of the first linear DNA fragment to the 
second linear DNA fragment. The first and second vectors, the first vector and 

30 the second linear DNA fragment, or the second vector and the first linear DNA 
fragment are combined in a suitable buffer with one or more restriction enzymes 
and optionally DNA ligase under conditions effective to result in digestion and 
optionally ligation to yield a mixture optionally comprising a third vector 
comprising the first and second linear DNA molecules which are joined in an 
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oriented manner. Optionally, prior to digestion with the one or more restriction 
enzymes, the sites for the one or more restriction enzymes in the open reading 
frame are protected so as to prevent digestion. In one embodiment, the sites are 
protected by methylation and, optionally, prior to methylation, the flanking sites 
5 for the first or second restriction enzyme are contacted with an oligonucleotide 
conq)lementary to the flanking restriction enzyme site and RecA. In one 
embodiment, the second restriction enzyme generates blmit ends and the first 
linear DNA fragment is flanked by a first single-strand DNA overhang and a 
blunt end. M one CTibodiment, the first and third restriction enzymes are not the 
10 same. In another embodiment, the second and fourth restriction enzymes are not 
flie same or each generates blunt ends. Jn another embodiment, the DNA 
sequence of interest comprises an open reading fi:ame comprismg one or more 
sites for the first or second restriction enzyme. 

Further provided is a vector system for cloning. The vector system 
15 includes a first vector comprising a first selectable marker gene and a DNA 
sequence of interest, which DNA sequence of interest is flanked by at least two 
restriction enzyme sites, wherein at least one of the flanking restriction enzyme 
sites is a site for a first restriction enzyme which has infrequent restriction sites 
in cDNAs or open reading firames from at least one species and generates 
20 complementary single-strand DNA overhangs, wherein at least one of the 
flanking restriction enzyme sites is for a second restriction enzyme which has 
infrequent restriction sites in cDNAs or open reading frames from at least one 
species and generates ends that are not complementary to the overhangs 
generated by the first restriction enzyme, wherein digestion of the first vector 
15 generates a fu-st linear DNA fragment which lacks the first selectable maricer 
gene but comprises the DNA sequence of interest, wherein the restriction 
enzyme sites are designed such that the first linear DNA fragment can be 
reUgated directly to a second vector. The second vector includes a second 
selectable marker gene which is distinguishable from the first selectable marker 
gene and non-essential DNA sequences, optionally including a counterselectable 
gene, which non-essential DNA sequences are flanked by at least two restriction 
enzymes sites, wherein at least one of the flanking restriction enzyme sites in the 
second vector is for a third restriction enzyme which generates complementary 
single-strand DNA overiiangs which are complementary to the single-strand 
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DNA overhangs generated by the first restriction enzyme, wherein at least one of 
the flanking restriction sites in the second vector is for a fourth restriction 
enzyme which generates ends that are not complementary to the ends generated 
by the first or third restriction enzyme but can be ligated to the ends generated by 

5 the second restriction enzyme. Digestion of the second vector with the third and 
fourth restriction CTizymes generates a second linear DNA fragment which lacks 
the non-essential DNA sequences but comprises the second selectable marker 
gene, wherem the second Imear DNA firagment is flanked by ends which permit 
the oriented joining of the first linear DNA fragment to the second linear DNA 

1 0 firagment . A kit comprising one or more of the vectors of the vector system is 
also provided. 

In one embodiment, the second restriction enzyme generates blunt ends 
and the first linear DNA fragment is flanked by a first single-strand DNA 
overhang and a blunt end. In one embodiment, the first and third restriction 

15 enzymes are not the same. In another embodiment, the second and fourth 

restriction enzymes are not the same or each generates blimt ends. For instance, 
in one embodiment, one of the restriction enzymes is Aarl^ Ascl, BbrCl, Cspl, 
DraX Fsel Notl, Nrul, PacI, Pmel Pvul, SapX Sdal, Sfil, Sgfl, Spll, Srfi, Swal, 
or a restriction enzyme which has the same recognition site as AarX AscU BbrCU 

20 Cspl, Oral, Fsel Notl, Nrul PacX Pmel, PvmI, Sapl, Sdal Sffl, SgfU SpK, SrfU 
Swal. 

In one embodiment, at least one restriction enzyme site flanking the open 
reading firame of interest is for one of 5g/I, Pvul or Pad, restriction enzymes 
which generate ends compatible with SgfX e.g., Sgfi, Pvul, BstKTl or Pflcl, or 

25 restriction enzymes that yield ends that can be selected to have the proper 3* TA 
overhang, e.g., Aasl 5ce83I, BsiBl, Bcg\ Bpml, BpuBl BseMl, Bse3m, 
BseUa, BseRl, Bsgl BspCMl, BsrDl, Bst¥5l, BseGly Btsl, Drdl, DseDl, EciX 
Eco57MI, EcoSll, BceS2l, Gsul, Mmel TspDTl, Tthl 1 IH, BspKJSl, Aculy 
BspKT6l, Eco57Ml, TaqU, TspGWly or isoschizomers thereof. In one 

30 embodiment, at least one restriction enzyme site flanking the open reading firame 
of interest is for one of Sgfl (AsiSI), Pad, or Pvul (AfallMI, 4/al6RI, BspCU 
EagBl £r/zB9I, Mvrl, Nbll, Plel9l PsulSll, Rshl, XorJT), 

In another embodiment, at least one restriction enzyme site flanking the 
open reading firame of interest is for Pwel {MssT), DrdU AhdUI (Dral, PauAJI, 
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SruT), Nrul (Bsp68l, MluB2l, SboUI, S^I), SnaBl (BstSM, EcolOSJ), Srfl, oi 
Swal iBstRZ2A6I, Bs&WI, MspSm, SmiT). In another embodiment, at least one 
restriction enzyme site flanking the open reading frame of interest is for a 
restriction enzyme that generates a blunt end which can create a stop codon after 
Hgation with another blunt end, for instance, one that can create a stop codon 
after hgation with an end generated by Plwci; e.g., EcdBCSl (TCKjA), SciT. 
(CTC^GAG), HincD. (GTCSSAC, a version of GTYRAC), Hpal (GTT'^AAC), 
HincE (GTT^AAC, a version of GTYRAC), Dral (TTT'^AAA), Swal 
(ATTT'^AAAT), or an isoschizomer thereof or for a restriction enzyme that 
yields ends that can be selected to have a blunt end such as 5'GA 5'AG or 5'AA, 
e.g.. BsaBl CacSl, HpySl Myl PshM, SspDSI, or an isoschizomer thereof. 
For example, Hgation of ends generated by Pmel and Dral can create a stop site, 
as would hgation of NTT and GAN, NCT and AGN, or NTT and AAN, wherem 
each N is A, T, G or C. In one embodiment, the exchange site formed from 
blunt end hgation of an end generated by Pmel and that of another blunt cutter 
can yield a coding sequence for a protein fusion. For instance, hgation of an 
open reading frame terminating in an end generated by Pmel and an end 
generated by Ban, BfrBl, BsaM, BsaBl BsrBl Btrl, Cac^l, Cdil, CviJl, CviEO. 
Eco47m, Ecol%\, ^coICRI, £coRV, Fnuim, FspAl, Hael, HaeJE, HpySl, Lpnl, 
20 Myl, MsIL, Mstl, Nael, MaJV, Nrul, NspBE, Olil, PmaCl, Pmel, PshM, PsO, 
PvuE, Rsal, Seal, Smal SnaBl Srfl, Sspl, SspDSl, Stul, Xcal Xmnl Zral or an 
isoschizomer thereof can extend the open readmg frame at the 3' end. 

hi one embodiment, the first restriction enzyme is Sgfi. and optionally, the 
second restriction enzyme is Pmel. hi another embodunent, the third restriction 
25 enzyme generates a 3' TA overhang, e.g.. the third restriction enzyme is Pvul or 
Pad. 

hi one embodiment, the mvention provides a method to directionally 
clone a DNA sequence of mterest which employs a recipient vector comprismg a 
DNA sequence of mterest, e.g., optionaUy encoding a fusion protem, fhmked by 
30 at least two restriction enzyme sites, one of which is for a first restriction enzyme 
that has a low fi»quency of recognition sites m a plurahty of cDNAs or open 
readmg frames for a particular species and generates smgle-sttand DNA 
overhangs, and the other of which flankmg sites is for a second restriction 
enzyme that has a low frequency of recognition sites m a plurahty of cDNAs or 
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open reading frames for a particular species and generates blunt ends. An 
acceptor vector may comprise a counter-selectable marker flanked by at least 
two restriction enzymes sites. One of the flanking sites in the acceptor vector is 
for a third restriction enzyme which generates single-strand DNA overhangs 
5 which are complementary with the single-strand DNA overhangs produced when 
the recipient vector is digested with the first restriction enzyme. The other 
flanking site in the acceptor vector is for a fourth restriction enzyme which 
generates blunt ends. The method includes contacting the recipient vector wifli 
the first and second restriction enzymes and the acceptor vector with the third 

1 0 and fourth restriction enzymes, ligating the resulting linear molecules, 

transforming a host cell with the ligation mixture, and selecting for host cells 
with desirable recombinant molecules, i.e., vectors with the DNA sequence of 
interest and the acceptor vector backbone, e.g., vectors which lack the counter- 
selectable gene, and optionally include a selectable marker present on the 

1 5 acceptor vector backbone. In one embodiment, the first and third restriction 
enzymes are the same. In one embodiment, the second and fourth restriction 
enzymes are the same. In this manner, DNA sequences of interest may be 
moved firom one expression vector to another, for instance, to express a fiision 
protein encoded by a fusion of acceptor vector sequences, the exchange site(s), 

20 and the DNA sequence of interest. 

The invention also provides a method for producing a vector suitable for 
expression of an amino acid sequence of interest. The method includes 
combining at teast two vectors in a suitable buffer with one or moreorestriction 
enzymes and optionally DNA ligase under conditions effective to result in 

25 digestion and optionally ligation to yield a mixture optionally comprising a third 
vector. A first vector includes a first selectable marker gene and a DNA 
sequence of interest, which DNA sequence of interest is flanked by at least two 
restriction enzyme sites, wherein at least one of the flanking restriction enzyme 
sites is a site for a first restriction enzyme which has infi-equent restriction sites 

30 in cDNAs or open reading firames firom at least one species and generates 
complementary single-strand DNA overhangs, wherein at least one of the 
flanking restriction enzyme sites is for a second restriction enzyme which has 
infirequent restriction sites in cDNAs or open reading frames from at least one 
species and generates ends that are not complementary to the overhangs 
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generated by the first restriction enzyme, wherein digestion of the first vector 
generates a first linear DNA fiagment which lacks the first selectable marker 
gene but comprises the DNA sequence of interest. A second vector includes a 
second selectable marker gene which is distinguishable &om the first selectable 
5 marker gene and non-essential DNA sequences, optionally including a 

counterselectable gene, which non-essential DNA sequences are flanked by at 
least two restriction enzymes sites, wherein at least one of the flanking 
restriction enzyme sites in the second vector is for a third restriction enzyme 
. which generates single-strand DNA overhangs which are complementary to the 
10 single-strand DNA overhangs generated by the first restriction enzyme, wherein 
at least one of die flanking restriction sites in the second vector is for a fourth 
restriction enzyme vMch generates ends that are not complementary to the ends 
generated by the first or third restriction enzyme but can be ligated to the ends 
generated by the second restriction enzyme. Digestion of the second vector with 
15 tiie third and fourth restriction enzymes generates a second linear DNA fi:agment 
which lacks the non-essential DNA sequences but comprises flie second 
selectable marker gene, wherein the second linear DNA firagment is flanked by 
ends which permit the oriented joining of the first linear DNA fi^agment to the 
second linear DNA fi-agment. In one embodiment, the second restriction enzyme 
20 generates blunt ends and the first linear DNA fi-agment is flanked by a first 

single-strand DNA overhang and a blunt end. In another embodiment, the first 
and third restriction enzymes are not the same. In yet another embodiment, flie 
second and fourth restriction enzymes are not the same. In yet a fiuther 
embodiment, the second and fourth restriction enzymes generate blunt ends. 
25 In one embodiment, ligation and oriented joining yields a third vector 

encoding a N-tenninal fusion protein which is encoded by the DNA sequence of 
interest and nucleic acid sequences 5' to flie 3' end of tiie second linear DNA 
firagment. In another embodhnent, ligation and oriented joming yields a third 
vector encoding a C-tenninal fiision protein which is encoded by the DNA 
30 sequence of mterest and nucleic acid sequences 3* to the 5' end of the second 
linear DNA fi-agment. In another embodiment, ligation and oriented joining 
yields a fliird vector encoding a fiision protein which is encoded by the DNA 
sequence of interest and nucleic acid sequences 5' and 3* to die respective 3' and 
5' end of the second linear DNA fi:agment. In yet another embodiment, ligation 
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and oriented joining yields a tfiird vector encoding a fusion protein encoded by 
the DNA sequence of interest and the exchange site(s) created by the oriented 
joining. Optionally, the fusion protein is a GST fiision protein, GFP fusion 
protein, thioredoxin fusion protein, maltose bindmg protein fusion protein, 
S protease cleavage site fusion protein, metal binding domain fusion protein or 
dehalogenase fusion protein, and/or is more soluble, easier to purify, or easier to 
detect relative to the corresponding non-fusion protein. 

The methods of the invention thus employ one or more restriction 
en2;ymes that generate unique ends and optionally ligase to clone an open 

10 reading frame of interest. Vectors with one or more restriction enzyme sites for 
restriction enzymes that provide unique ends are particularly useful in directional 
cloning and ordered gene assembly. Moreover, the use of the vectors and 
methods of the invention is easy, inexpensive, fast, automatable, and results in 
high fidelity and transfer of open reading frames. Further, the vectors may be 

15 designed to express fusion proteins with no or one to a few, e.g., less than 7, 
amino acid residues fused to the N-terminus, C-terminus, or both the N- and C- 
termini. For instance, fusions generated with Sfil sites flanking the DNA 
sequence of interest may yield fusion proteins with 4 amino acid residues at the 
N-terminus and C-terminus, while fusions generated with Sgfl/Pmel or Sapl sites 

20 flanking the DNA sequence of interest may yield fusion proteins with a single 
amino acid residue only at the C-terminus. If iSj^I or Pmel sites are added to a 
DNA sequmce of interest, e.g., using an amplification reaction, an additional 3-S 
bp flanking the recognition site may be included to increase cleavage efficiency. 
Moreover, N- and/or C-terminal fusions with fusion partner sequences useful in 

25 purification, e.g., immobilization, solubilization, in situ detection, protdn 

domain studies, and protein-protein interactions, e.g., in vitro or in v/vo, may be 
prepared, wherein fusion partner sequences are encoded by acc^tor vector 
sequences and/or exchange sites. 

Also provided is a recombmant host cell useful to reduce unintended 

30 expression from a vector. In one embodiment, the host cell is deficient in one or 
more inducible genes, for instance, the host cell does not express one or more 
rhamnose catalytic genes, e.g., the host cell is rhaBAD', and comprises an 
expression vector, e.g., one which is stably introduced to the host cell. The 
expression vector comprises an inducible promoter for the one or more genes. 
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which promoter has a low level of vminduced expression and preferably has a 
relatively slow induction profile but high final levels of expression, e.g,, a 
rAaBAD promoter, and which promoter is operably linked to an open reading 
firame, such as one for a heterologous (non-native) transcription regulatory gene 

5 product, e.g., a RNA polymerase. In one embodiment, the recombmant host cell 
is deficient in rhamnose catabolism, and has a recombinant DNA molecule 
comprising a rhamnose-inducible promoter operably linked to an open reading 
flame for a heterologous RNA polymerase. In one embodiment, the host cell is 
a prokaryotic cell, for instance, an E. coli cell, in one embodiment, the 

1 0 heterologous RNA polymerase is a phage RNA polymerase, such as a T7 RNA 
polymerase. The recombinant host cell may be contacted with an expression 
vector comprising a promoter for the heterologous RNA polymerase and an open 
reading firame of interest, and rhamnose, e.g., either simultaneously or 
sequentially. 

15 Thus, the invention provides a method of inducing expression of a DNA 

sequence of interest in a host cell. The method includes contacting a 
recombinant host cell which is deficient in rhamnose catabolism, and has a 
recombinant DNA molecule comprising a rhamnose-inducible promoter 
operably linked to an open reading fi:ame for a heterologous RNA polymerase, 

20 with rhamnose and an expression vector comprising a promoter for the 

heterologous RNA polymerase operably linked to a DNA sequence of interest. 
In one embodiment, the DNA sequence of interest is flanked by two restriction 
enzyme sites, wherein one of the flanking restriction enzyme sites is for a first 
restriction enzyme which has infrequent restriction sites in cDNAs or open 

25 reading firames from at least one species and generates single-strand DNA 

overhangs, and wherein another flanking restriction enzyme site is for a second 
restriction enzyme which has infrequent restriction sites in cDNAs or open 
reading frames from at least one species and generates ends that are not 
complementary to the overhangs generated by the first restriction enzyme. In 

30 one embodiment, Ihe expression vector comprises a transcription terminator 
sequence, e.g., rmB, and a promoter S' to the open reading frame of interest, 
which promoter is upregulated by the heterologous transcription regulatory gene 
product, as well as restriction sites for one or more infrequent cutters flanking 
the open reading frame, and optionally, in the vector backbone, a selectable 

20 
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marker gene, a sequence which specifies a high vector copy number, and a 
sequence which reduces vector multimerization, e.g., cer. An expression vector 
comprising a promoter such as one for a heterologous transcription regulatory 
gene product, such as a RNA polym^tise, which promoter is operably linked to 

5 an open reading frame of interest, may also be employed in an in vitro 
transcription/translation system. 

Further provided is an isolated nucleic acid fragment encoding bamase 
which lacks a secretory domain (signal), a vector comprising the nucleic acid 
fragment, such as one which comprises a promoter, for instance, a A.Pl promoter 

10 linked to the nucleic acid fragment, isolated protein encoded by the nucleic acid 
fragment, and a host cell comprising the vector. Optionally, the host cell 
expresses barstar. In one embodiment, the host cell expresses barstar from a 
promoter which is constitutively expressed in prokaryotic cells. Optionally, the 
host cell is an E. coll cell. In one embodiment, an open reading frame for barstar 

15 is expressed from a 4c promoter. In one embodiment, the vector system of the 
invention includes a second vector comprising a counterselectable gene 
comprising a nucleic acid fragment encoding a bamase which lacks a secretory 
domain. For instance, the invention provides a method comprising introducing a 
vector comprising a nucleic acid fragment encoding a bamase which lacks a 

20 secretory domain into a recombinant host cell which expresses barstar from a 
promoter which is constitutively expressed in prokaryotic cells. 

Also provided is a method comprising introducing the vector system of 
the invention into a host cell, wherein the second vector comprises a 
counterselectable gene comprising a nucleic acid fragment encoding a bamase 

25 which lacks a secretory domain. 

Also provided is a vector comprising an open reading frame 3 ' to a DNA 
fragment of no more than 30 base pairs. The DNA fragment comprises a 
ribosome binding site, a SgfL recognition site, and a sequence which, when 
present in mRNA, enhances the binding of the mRNA to the small subunit of a 

30 eukaryotic ribosome. In one embodiment, the DNA fragment includes 

AAGGAGCGATCGCCATGX (SEQ ED NO: 1), and wherein X is A, T, G or C. 

Further provided is a vector comprising a SgfL recognition site, a 
sequence which comprises ATG and which sequence, when present in mRNA, 

21 
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enhances the binduig of the mRNA to the smaU subunit of a eukaryotic 
ribosome, and an open reading fiame which begins at the ATG in the sequence. 

The invention also includes a vector comprising a recognition site for a 
first restriction enzyme lhat generates a 3' TA overhang which is 5' to a 
5 recognition site for a second restriction enzyme that generates blunt ends, which 
vector, once digested with the first and second restriction enzymes and Hgated to 
a DNA Augment comprising an open reading firame flanked by an end generated 
by Sgfi and an end generated by a third restriction enzyme which has infiequent 
restriction sites in <?DNAs or open reading fi^es &om at least one species and 
10 generates blunt ends, yields a recombinant vector comprising the open reading 
fiame. Jh one embodiment, the second and thiid restriction enzymes are the 
same. In another embodunent, the recognition site for the first restriction 
enzyme is a recognition site for iSg/I. 

Also provided is a vector comprising a first open reading fi-ame which 
15 includes a recognition site for a first restriction enzyme that generates a 3' TA 
overfwng and a recognition site for a second restriction enzyme that is not in the 
open reading fiame generates blunt ends, which vector, once digested with the 
first and second restriction enzymes and ligated to a DNA fragment comprising a 
second open reading flanked by an end generated by Sgfl and a third restriction 
20 enzyme which has infrequent restriction sites in cDNAs or open reading frames 
from at least one species and generates blunt ends, yields a recombinant vector 
comprising a third open reading fi^e comprising the first and second open 
reading frames, which third open reading fiame encodes a fiision peptide or 
protein. 

25 Further provided is a vector comprising a ribosome binding site which 

optionally overlaps by one nucleotide with a Sgfl recognition site and a 
recognition site for a first restriction enzyme that generates blunt ends, which 
vector, once digested with Sgfi and the first restriction enzyme and Hgated to a 
DNA fragment comprising an open readmg frame encodmg a peptide or 

30 polypeptide flanked by 

5- CGCCATGXiY, (SEQ ID N0:2) 
3' TAGCGGTACX2Y2 (SEQ ID N0:71) 

and a blunt end generated by a second restriction enzyme that has infiequent 
35 restriction sites in cDNAs or open leadmg fiames &om at least one species and 
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generates blunt ends, yields a recombinant vector which encodes the peptide or 
polypeptide, wherein Xi is the first codon which is 3' to the start codon for the 
open reading firame, wherein X2 is the complement of Xi. wherein Yi is the 
remainder of the open reading firame, and wherein Y2 is the complement of Yi. 
5 In one embodunent, Xi = GR1R2, wherem Ri or R2 = A, T, C or a 

Further provided is a vector comprising a first open reading firame which 
includes a Pmel recognition site and is flanked at the 5 ' end by a recognition site 
for a first restriction enzyme that generates complementary single-strand DNA 
overhangs, which vector, once digested with Pmel and the first restriction 
10 enzyme, and ligated to a DNA fragment comprising a blunt end at the 5 ' end of a 
second open reading frame and an end generated by a second restriction enzyme 
which generates single-strand DNA overhangs which are complementary to the 
single-strand DNA overhangs generated by the first restriction enzyme, yields a 
recombinant vector comprising a third open reading frame comprismg the first 
15 and second open reading frames. In one embodiment, the third open reading 
frame includes N1N2N3GTTTN4N5R (SEQ ID NO:72), wherem NiN2N3 and 
TN4N5 are codons that do not code for a stop codon, and wherein R is one or 
more codons. In another embodiment, the blunt end of the DNA firagment is 
generated by a restriction enzyme other than Pmel, In yet another embodiment, 
20 the blunt end of the DNA fragment is generated by Pmel digestion. 

The invention finther includes a vector comprismg a first open reading 
firame which includes a Pmel recognition site and is flanked at the 5 ' end by site 
for a first restriction enzyme that generates complementary single-strand DNA 
overhangs. The vector, once digested with Pmel and the first restriction enzyme, 
25 and ligated to a DNA fragment comprising a blunt end and an end generated by a 
second restriction enzyme which generates single-strand DNA overhangs which 
are complementary to the single-strand DNA overhangs generated by the first 
restriction enzyme, yields a recombinant vector which includes 
N1N2N3GTTTN4N5, wherein N1N2N3GTTT is a sequence from the 3' end of the 
30 digested expression vector. In one embodiment, the triplet N1N2N3 does not 
code for a stop codon, and N4 and N5 = A, or N4 - A and N5 = G or N4 = G and 
N5 = A. hi another embodiment, the triplet N1N2N3 codes for a stop codon. In 
one embodiment, the blunt end of the DNA firagment is generated by Pmel 
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digestion. In another embodiment, the blunt end of the DNA fragment is 
generated by a restriction enzyme other than Pmel. 

The invention provides a recombinant vector prepared by digesting a 
vector conq)rising a recognition site for a first restriction enzyme that generates a 
5 3' TA overhang which is 5* to a recognition site for a second restriction enzyme 
which generates blunt ends, with the first and second restriction enzymes and 
ligating the digested vector to a DNA fragment comprising an open reading 
frame flanked by an end generated by Sgfl and an end generated by a third 
restriction enzyme which has infiequent restriction sites in cDNAs or open 
10 reading frames from at least one species and generates blunt ends. 

Also provided is a support comprising a plurality of recombinant vectors, 
one or more of which comprise a different open readmg fi^e. At least one of 
the recombinant vectors comprises a promoter and a first open reading frame 
which is flanked by two exchange sites. The exchange sites are formed by 
1 5 ligation of a vector comprising the promoter which is 5' to a recognition site for 
a first restriction en2yme that generates a 3' TA overhang which is 5' to a 
recognition site for a first restriction enzyme which generates blunt ends, which 
vector is digested with the first and second restriction enzymes, and a DNA 
sequence comprising the first open reading frame flanked by an end generated 
20 by Sgfl and an end generated by a third restriction enzyme which has infrequent 
restriction sites in cDNAs or open reading frames from at least one species and 
generates blunt ends. A library of recombinant cells comprising tiie at least one 
recombinant vector or a library of vectors comprising the at least one 
recombinant vector is also provided. 
25 In another embodiment, the support comprises a plurality of recombinant 

vectors, two or more of which comprise an open reading frame for a different 
polypeptide At least one recombinant vector comprises a promoter and a first 
open reading frame comprising a second open reading firame and one or more 
codons which are in-firame with the second opeu reading fi^e, wherein the 
30 second open reading frame is flanked by two exchange sites. The exchange sites 
are formed by ligation of a DNA sequence comprising the second open reading 
firame which includes aP/wel recognition site and is flanked at the 5' end by a 
recognition site for a first restriction enzyme that generates complementary 
single-strand DNA overhangs, which DNA sequence is digested with Pmel and 
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the first restriction enzyme, and a vector comprising a blunt end at the 5' end 
which is 5 ' to the one or more in-jframe codons and the promoter which is 5' to 
an end generated by a second restriction enzyme which generates single-strand 
DNA overhangs which are complementary to the single-strand DNA overhangs 
5 generated by the first restriction enzyme. A library of recombinant cells 

comprising the at least one recombinant vector or a library of vectors comprising 
the at least one recombinant vector is also provided. 

Also provided is a support comprising a plurality of recombinant vectors, 
two or more of which comprise an open reading fi^e for a different 
10 polypeptide, wherein at least one recombinant vector comprises a promoter and 
an open reading frame which is flanked by two exchange sites. The exchange 
sites are formed by ligation of a DNA sequence comprising the open reading 
frame which is flanked by at least two restriction enzyme sites for a fu^t 
restriction enzyme which is a hapaxoterministic restriction enzyme, which DNA 
15 sequence is digested with the first restriction enzyme to generate a first DNA 
fragment flanked by a first pair of non-self complementary single-strand DNA 
overhangs, and a vector comprising the promoter and non-essential DNA 
sequences that are flanked by two restriction enzyme sites for a second 
restriction enzyme which is a hapaxoterministic restriction enzyme, which vector 
20 is digested with the second restriction enzyme to generate a second DNA 

fragment which lacks non-essential DNA sequences and is flanked by a second 
pair of non-self complOTientary single-strand DNA overhangs. Each of the 
second pair of the non-self-complementary DNA oveAangs is complementary to 
only one of the single-strand DNA overhangs of the first pair of non-self 
25 complementary single-strand DNA overhangs. A library of recombiaant cells 
comprising the at least one recombinant vector or a library of vectors comprising 
ttie at least one recombinant vector is also provided. 

The invention fiu^her provides a method to prepare a support comprisiag 
a plurality of recombinant vectors or recombinant cells. The method includes 
30 selecting a plurality of recombinant vectors or recombinant cells comprising 
recombinant vectors, wherein two or more of the recombinant vectors comprise 
an open reading frame for a different polypeptide, wherein at least one 
recombinant vector comprises a promoter and a first open reading frame which 
is flanked by two exchange sites. The exchange sites are formed by ligation of a 
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vector comprising the promoter which is 5' to a recognition site for a first 
restriction enzyme that generates a 3* TA overhang, which is 5' to a recognition 
site for a second restriction enzyme which generates blunt ends, which vector is 
digested with the first and second restriction enzymes^ and a DNA sequence 
5 comprising the furst open reading frame flanked by an end generated by Sgfl and 
an end generated by a third restriction enzyme which has infrequent restriction 
sites in cDNAs or open reading frames from at least one species and generates 
blunt ends. The selected recombinant vectors or recombinant cells are then 
introduced to one or more receptacles of the support. 

10 Further provided is a method to prepare a support comprising a plurality 

of recombinant vectors or recombinant cells , in this embodiment, a plurality of 
recombinant vectors or recombinant cells comprising recombinant vectors is 
selected, wherein two or more of the recombmant vectors comprise an open 
reading frame for a different polypeptide, wherein at least one recombinant 

1 5 vector comprises a promoter and a first open reading frame comprising a second 
open reading firame and one or more codons which are in-fi:ame with the second 
open reading &amo, wherein the second open reading frame is flanked by two 
exchange sites. The exchange sites are formed by ligation of a DNA sequence 
comprising the second open reading frame which includes a Pmel recognition 

20 site and is flanked at the 5 ' end by a recognition site for a first restriction enzyme 
that generates complementary single-strand DNA overhangs, which DNA 
sequence is digested with Pmel and the first restriction enzyme, and a vector 
comprising a blunt end at the 5' end which is 5' to the one or more codons and 
the promoter which is 5' to an end generated by a second restriction enzyme 

25 which generates single-strand DNA overhangs which are complementary to the 
single-strand DNA overhangs generated by ttie first restriction enzyme. The 
selected recombinant vectors or recombinant cells are introduced to one or more 
receptacles of the support. 

In one embodiment, the invention provides a method to prepare a support 

30 comprising a plurality of recombinant vectors or recombinant cells, which 
includes selecting a plurality of recombinant vectors or recombinant cells 
comprising recombinant vectors, wherein two or more of the recombinant 
vectors comprise an open reading fi:ame for a different polypeptide. At least one 
recombinant vector comprises a promoter and an open reading frame which is 
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flanked by two exchange sites, wherein the exchange sites are fonned by ligation 
of a DNA sequence comprising the open reading frame which is flanked by at 
least two restriction enzyme sites for a first restriction enzyme which is a 
hapaxoteiministic restriction enzyme, which DNA sequence is digested with thci 
S first restriction enzyme to generate a first DNA fi*agment flanked by a first pair 
of non-self complementary single-strand DNA overhangs, and a vector 
comprising the promoter and non-essential DNA sequences that are flanked by 
two restriction enzyme sites for a second restriction enzyme which is a 
hapaxoterministic restriction enzyme, which vector is digested with the second 

10 restriction enzyme to generate a second DNA fragment which lacks non- 
essential DNA sequences and is flanked by a second pair of non-self 
complementary single-strand DNA overiiangs. Each of the second pair of the 
non-self-complementary DNA overhangs is complementary to only one of the 
single-strand DNA overhangs of the first pair of non-self complementary single- 

15 strand DNA overhangs. The selected recombinant vectors or recombinant cells 
are introduced to one or more receptacles of the support. 

Also provided is a method to prepare a plurality of mutagenized 
recombinant vectors. The method includes providing DNAs comprising a 
pluraUty of mutagCTized open reading frames flanked by a recognition site for a 

20 first restriction enzyme that generates a 3' TA overhang and site for a second 
restriction enzyme which has infrequent restriction sites in cDNAs or open 
reading frames from at least one species and generates blunt ends. The DNAs 
are digested with the first and second restriction enzymes and ligated to a vector 
comprising a promoter which is 5' to a Sgfl recognition site which is 5' to a 

25 recognition site for a third restriction enzyme which generates blunt ends, which 
vector is digested with Sgfl and the third restriction enzyme, yielding a plurality 
of mutagenized recombinant vectors. 

In one embodiment, DNAs comprising a plurality of mutagenized open 
reading frames are flanked by a Sgfl recognition site and a site for a first 

30 restriction enzynie which has infrequent restriction sites in cDNAs or open 

reading frames from at least one species and generates blunt ends, and the DNAs 
are digested with Sgfl and the first restriction enzyme and ligated to a vector 
comprising a promoter which is 5* to a recognition site for a second restriction 
enzyme that generates 3' TA overhang which is 5' to a recognition site for a third 
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restriction enzyme which generates blunt ends, which vector is digested with the 
second and third restriction enzymes, yielding a plurality of mutagenized 
recombinant vectors. 

The invention also includes a method to prepare a plurality of 
5 mutagenized recombinant vectors, which includes providing DNAs comprising a 
plurality of mutagenized open reading frames flanked by two restriction enzyme 
sites for a first restriction enzyme which is a hapaxoterministic restriction 
enzyme and generates a first pair of non-self complementary single-strand DNA 
overiiangs. The DNAs are digested with the first restriction enzyme and ligated 

10 to a vector comprising a promoter and non-essential DNA sequences flanked by 
two restriction enzyme sites for a second restriction enzyme which is a 
hapaxotemunistic restriction enzyme, which vector is digested with the second 
restriction enzyme generating a DNA fi^gment which lacks non-essential DNA 
sequences but comprises a second pair of non-self complementary single-strand 

15 DNA overhangs, wherein each of the second pair of the non-self-complementary 
DNA overhangs is complementary to only one of the single-strand DNA 
overhangs of the first pair of non-self complementary single-strand DNA 
overhangs, yielding a plurality of mutagenized recombinant vectors. 

The vectors of the invention and methods of the invention which employ 

20 the vectors, are particularly useful in directional cloning of open reading fi:ames. 
However, the vectors and methods of the invention are usefiil in other 
2^plications, for example, they maybe employed to prepare probes, e.g., 
radioactive or nonradioactive probes, for instance, using vectors with promoters 
specific for a polymerase, such as bacteriophage polymerases, to prepare single- 

25 strand sense or anti-sense probes or therapeutic antisense RNA; or to insert a 
gene in an antisense orientation such that it is not expressed or expressed only 
after structural rearrangement (conditional gene inactivation), e.g., via 
recombination with Cre/lox (U.S. Patent No. 5,658,772), FLP/FRT, the Gin 
recombinase of Mu, the Pin recombinase of E. coli, and the R/RS system of the 

30 pSRl plasmid. 

Also provided is a method for performing genetic analysis. The method 
comprises populating a database of genetic data with genetic data to create a 
plurality of genetic records. The database containing genetic data is queried to 
identify a first subset of genetic records, wherein each record has at least one 
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recognition site for restriction enzymes included in a set of predetermined 
restriction enzymes, and a set of statistics associated with the restriction enzyme 
recognition sites for at least a second subset of genetic records in the first sublet 
is determined. 

5 In one embodiment, determining the set of statistics includes determining 

a number of genetic records including recognition sites for one predetermined 
restriction enzyme or for each of the pred'etemiined restriction enzymes in the 
set. In another embodiment, determining the set of statistics includes 
determining a number of occurrences of at least one site for the one 

10 predetermined restriction enzyme or for the predetermined restriction enzymes in 
a genetic record in the second subset. In yet another embodiment, the genetic 
records comprise nucleic acid sequences. In one embodiment, the method 
further includes filtering the subset of genetic records to include or exclude 
genetic records having one or more selected characteristics. In yet another 

15 embodiment, the method fiirther includes filtering the subset of genetic records 
to exclude genetic records havmg a size greater than a predetermined value. In 
one embodiment, the predetemiined value is 21000 characters. The method may 
also include detenniiung the sequence of specific bases which are present as 
ambiguous bases within a recognition site or which are present between a 

20 recognition site for a restriction enzyme and the position at which the restriction 
enzyme cleaves DNA containing the recognition site. Jn one embodiment, at 
least one of the restriction enzymes has a 6 bp, 7 bp or 8 bp recognition site. In 
one embodiment, at least one of the restriction enzymes is a hapaxoterministic 
restriction enzyme. ' 

25 Further provided is a computer-readable medium having computer 

executable instructions for performing a method for performing genetic analysis. 
The medium includes populating a database of genetic data with a plurality of 
genetic records, querying the database of genetic data to identify a first subset of 
genetic records, wherein each record has at least one recognition site for one 

30 predetermined restriction enzyme or for restriction enzymes included in a set of 
predetermined restriction enzymes, and determining a set of statistics associated 
with the restriction enzyme recognition sites for at least a second subset of 
genetic records in the first subset. Also provided is a computerized system for 
genetic analysis. The system includes a database of genetic data, a processor, a 
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10 



set of one or more programs executed by the processor causing the processor to 
query the database of genetic data to identify a first subset of genetic records, 
wherein each record has at least one recognition site for one predetermined 
restriction enzyme or for restriction enzymes included in a set of predetennined 
restriction enzymes, and determine a set of statistics associated with the 
restriction enzyme recognition sites for at least a second subset of genetic 
records in the first subset In one embodiment, the set of statistics includes. e.g., 
includes determining, a number of genetic records mcluding recognition sites for 
one predetermined restriction enzyme or for each of the predetermined 
restriction enzymes in the set. In one embodiment, the set of statistics includes, 
e.g., includes determining, a number of occurrences of at least one site for the 
one predetermined restriction enzyme or for the predetermined restriction 
enzymes in a genetic record in the second subset In one embodiment, the 
genetic records comprise nucleic acid sequences. In one embodiment, the 
15 method further con^rises filtering, or a processor is further operable to filter, the 
subset of genetic records to include or exclude genetic records having one or 
more selected characteristics. Ja mother embodiment the method further 
comprises filtering, or a processor is further operable to filter, the subset of 
genetic records to exclude genetic records having a size greater than a 
20 predetermined value. In one embodiment, the predetermined value is 21000 
characters. In another embodiment the method further comprises determining, 
or a processor is further operable to determine, a sequence of specific bases 
which are present as ambiguous bases within a iwognition site or which are 
present between a recognition site for a restriction enzyme and the position at 
which the restriction enzyme cleaves DNA containing tixe recognition site. 
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Brief Descri ption of the Fip nrpc 

Figure 1 . Exemplary hapaxomers (SEQ E) N0s:16 and 20). 

Figures 2A-B. Examples of hapaxomers with 3' or 5' overhangs. A) The 
symmetry of the site recognized by Ahm, a resbiction enzyme fliat cleaves an 
intemipted palindrome wifliin the recognition site. If the bases denoted "N" are 
ignored, the site is symmetiicaUy equivalent to aPv«n site. Arrows indicate the 
cleavage sites on both steands. Note that a recognition and cleavage site on only 
one stiand must be stipulated owing to the existence of a two-fold axis of 
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symmetry. However, because cleavage by AIwWl results in DNA with overhangs 
consisting of three bases with four possibilities for each unspecified base, the 
sequence at the termini will be different depending on the strand. B) The Fokl 
recognition and cleavage sites illustrated in both orientations (SEQ ID NOs:73- 
5 74). Because the site lacks symmetry, there are two ways to write the bases &om 
5' to 3*. The cleavage sites on both strands, indicated by arrows, must be 
specified in order to indicate where cutting will occur. Because the cleavage sites 
are outside the recognition site, the single-stranded overhangs can be any set of 
four bases. Note that i4/wNI generates 3' overhangs, whereas Fokl generates 5' 
10 overhangs. 

Figures. A flowchart to identify restriction enzymes that have 
infi'equent recognition sites in the genome of a particular organism. 

Figure 4. Comparison of the percent of sequences in various organisms 
which lack (0), have no or one (0-1), or no, one or two (0-2) recognition sites for 
15 Sapl, Sfil or Sgft/Pmel. 

Figure 5. Site firequencies of selected restriction enzymes in six species 
(SEQ ID NOs:20, 55, 71 and 75-78). 

Figure 6. General overview of the use of interrupted palindromes for 
directional cloning. 

20 Figure 7. Directional cloning using SfE (SEQ ID N0s:7 and 80). 

Figures. PGR interrupted palindromes cloning pathways. 
Figures 9A-B. PGR interrupted palindromes cloning pathways. 
Figures 1 OA-B. PGR interrupted palindromes cloning pathways. 
Figure 1 1 . Restriction endonucleases useful for directional cloning with 
25 Sfil or other restriction enzymes generatmg 3 base 3 ' overhangs (SEQ ID N0s:7, 
12, 14, 20, 79 and 81-82). 

Figure 12. General overview of the use of Type nS enzymes for 
directional cloning. 

Figure 13. Directional cloning using Sapl (SEQ ID N0s:4 and 16). 
30 Figures 14A-B. Two enzyme approach for directional cloning with an 

enzyme that generates staggered ends and an enzyme that generates blunt ends, 
e.g., Sgfl and Pmel. 

Figure 1 S. Two enzyme cloning pathway with PGR entry. 
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Figure 16. Use of Sgfl to generate N-terminal fusions or no fusion at the 
N-tenninus (SEQ ID NOs:83-85). 

Figure 17. Use ofPmel to gmerate C-terminal fusions including fusions 
with a single amino acid (SEQ ID NOs:86-88). 
5 Figure 18. Use of a combination otSgfU Pmel, Pad and jSWal to prepare 

a vector encoding two proteins of interest. 

Figures 19A-B. N-terminal Pacl-Sgfl fusion site (SEQ ID NOs:89-90) 
and C-terminal Pmel-Swal fusion site (SEQ TD N0:91). 

Figure 20A. Exemplary luciferase donor and acceptor vectors of the 
10 invention. 

Figure 20B. Analysis ligation of the donor and acceptor vector 
sequences having i^^I sites flanking distinguishable luciferase genes. 

Figure 21A-E. Exemplary vectors of the invention. KanR = kanamycin 
resistance gene; AmpR= ampicillin resistance gene; ColEl ori = origin of 
1 5 replication sequence; cer = XerCD site-specific recombinase target site; rmB 
term = bidirectional terminator; T7 P = T7 RNA polymerase promoter; 
RBS/Kozak = ribosome binding site and Kozak sequences; and T7 tenn = T7 
RNA polymerase termination sequence. 

Figure 22 A. Luciferase expression after induction of expression in 3 
20 different hosts at 37°C. 

Figure 22B. Luciferase expression in 3 different hosts at 25°C, t - 0. 

Figure 22C. Luciferase expression in 3 different hosts at 25°C, t = 5 
hours and 21 hours. 

25 Petailed Description of the Invention 

Definitions 

The term "unique restriction enzyme site" indicates that the recognition 
sequence for a given restriction enzyme appears once within a nucleic acid 
molecule. 

30 The terms "polylinker" or "multiple cloning site" refer to a cluster of 

restriction enzyme sites on a nucleic acid construct which are utilized for the 
insertion and/or excision of nucleic acid sequences such as the coding region of a 
g&iGy lox sites, etc. 
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The term "prokaryotic termination sequence" refers to a nucleic acid 
sequence which is recognized by the RNA polymerase of a prokaryotic host cell 
and results in the termination of transcription. Prokaryotic termination sequences 
commonly comprise a GC-rich region that has a twofold synnhetry followed by 
5 an AT-rich sequence. Commonly used prokaryotic termination sequences are 
the T7 and rmB termination sequences. A variety of termination sequences are 
known to the art and may be employed in the nucleic acid constructs of the 
present invention including^ the Tint, Tli, Tl2, Tu, Tri, Tr2, Tds tranination 
signals derived from the bacteriophage lambda and termination signals derived 

10 from bacterial genes such as the tip gene of E, colL 

The term "eukaryotic polyadenylation sequence" (also referred to as a 
"poly A site" or "poly A sequence") as used herein denotes a DNA sequence 
which directs both the termination and polyadenylation of the nascent RNA 
transcript. Efficient polyadenylation of the recombinant transcript is desirable as 

1 5 transcripts lacking a poly A tail are unstable and are rapidly degraded. The poly 
A signal utilized in an expression vector may be "heterologous" or 
"endogenous." An endogenous poly A signal is one that is found naturally at the 
3* end of the coding region of a given gene in the genome. A heterologous poly 
A signal is one which is one which is isolated from one gene and placed 3' of 

20 another gene. A commonly used heterologous poly A signal is the SV40 poly A 
signal. The SV46 poly A signal is contained on a 237 bp BamUVBcR restriction 
fragment and directs both temiination and polyadenylation (Sambrook et al.. 
Molecular Cloning: A Laboratory Manual. Cold Spring Harbor (1989)); 
numerous vectors contain the SV40 poly A signal. Another commonly used 

25 heterologous poly A signal is derived from the bovine growth hormone (BGH) 
gene; the BGH poly A signal is available on a number of commercially available 
vectors. The poly A signal from the herpes simplex virus thymidine kinase (HSV 
tk) gene is also used as a poly A signal on expression vectors. 

As used herein, the terms "selectable marker" or "selectable marker 

30 gene" refers to the use of a gene which encodes an enzymatic activity that 
confers the ability to grow in medium lacking what would otherwise be an 
essential nutrient (e.g., the TRPl gene in yeast cells); in addition, a selectable 
marker may confer resistance to an antibiotic or drug upon the cell in which the 
selectable marker is expressed. A selectable marker may be used to confer a 
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particular phenotype upon a host cell. When a host cell must express a selectable 
marker to grow in selective medium, the marker is said to be a positive 
selectable marker (e.g., antibiotic resistance genes which confer the ability to 
grow in the presence of the ^propriate antibiotic). Selectable markers can also 
5 beusedtoselect against host cells containing a particular gene (e.g., the facB 
gene which, if expressed, kills the bacterial host cells grown in medium 
containing 5% sucrose); selectable markers used in this manner are referred to as 
negative selectable markers or counter-selectable markers. 

As used herein, the term "vector" is used in reference to nucleic acid 

10 molecules that transfer DNA segment(s) from one cell to another. The term 
"vehicle" is sometimes used interchangeably with "vector." A "vector" is a type 
of "nucleic acid construct." The term "nucleic acid construct" includes circular 
nucleic acid constructs such as plasmid constructs, plasmid constructs, cosmid 
vectors, etc. as well as linear nucleic acid constructs (e.g., lambda, phage 

1 5 constructs, PGR products), the nucleic acid construct may comprise expression 
signals such as a promoter and/or an enhancer (in such a case it is referred to as 
an expression vector). 

The term "expression vector" as used herein refers to a recombinant 
DNA molecule containing a desired coding sequence and appropriate nucleic 

20 acid sequences necessary for the expression of the operably linked coding 
sequence in a particular host organism. Nucleic acid sequences necessary for 
expression in procaryotes usually include a promoter, an operator (optional), and 
a ribosome binding site, often along with other sequences. Eukaryotic cells are 
known to utilize promoters, enhancers, and termination and polyadenylation 

25 signals. 

The terms "in operable combmation", "in operable order" and "operably 
linked" as used herein refer to the linkage of nucleic acid sequences in such a 
manner that a nucleic acid molecule capable of directing the transcription of a 
given gene and/or the synfliesis of a desired protein molecule is produced. The 
30 term also refers to the linkage of amino acid sequences in such a maimer so that 
a functional protein is produced. 

The terms "transformation" and "transfection" as used herein refer to the 
introduction of foreign DNA into prokaryotic or eucaryotic cells. Transformation 
of prokaryotic cells may be accomplished by a variety of means known to the art 
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including the treatment of host cells with CaCh to make competent cells, 
electroporation, etc. Transfection of eukaryotic cells may be accomplished by a 
variety of means known to the' art including calcium phosphate-DNA co- 
precipitation, DEAE-dextran-mediated transfection, polybrene-mediated 
5 transfection, electroporation, microinjection, liposome fusion, lipofection, 
protoplast fusion, retroviral infection, and biolistics. 

As used herein, the terms "restriction endonucleases" and "restriction 
enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at 
or near a specific nucleotide sequence. 
10 As used herein, the term "recombmant DNA molecule" as used herein 

refers to a DNA molecule which is comprised of segments of DNA joined 
together by means of molecular biological techniques. 

As used herein, "recognition site" refers to a sequence of specific bases 
that is recognized by a restriction enzyme if the sequence is present in double- 
15 stranded DNA; or, if the sequence is present in single-stranded RNA, the 

sequence of specific bases that would be recognized by a restriction enzyme if 
the RNA was reverse transcribed into cDNA and the cDNA employed as a 
template with a DNA polymerase to generate a double-stranded DNA; or, if the 
sequence is present in smgle-stranded DNA, the sequence of specific bases that 
20 would be recognized by a restriction enzyme if the single-stranded DNA was 
employed as a template with a DNA polymerase to generate a double-stranded 
DNA; or, if the sequence is present m double-stranded RNA, flie sequence of 
specific bases that would be recognized by a restriction enzyme if either strand 
of RNA was reverse transcribed into cDNA and the cDNA employed as a 
25 template with a DNA polymerase to generate a double-stranded DNA. 

An "open reading frame" includes at least 3 consecutive codons which 
are not stop codons. 

DNA molecules are said to have "5' ends" and "3' ends" because 
mononucleotides are reacted to make oligonucleotides in a manner such that the 
30 5' phosphate of one mononucleotide pentose ring is attached to the 3* oxygen of 
its neighbor in one direction via a phosphodiester linkage. Therefore, an end of 
an ohgonucleotides referred to as the "5* end" if its 5* phosphate is not linked to 
the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' 
oxygen is not hnked to a 5* phosphate of a subsequent mononucleotide pentose 
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ring. As used herein, a nucleic acid sequence, even if internal to a larger 
oligonucleotide, also may be said to have 5* and 3* ends. In eitiier a linear or 
circular DNA molecule, discrete elements are referred to as being "upstream" or 
5* of the "downstream" or 3' elements. This terminology reflects the feet that 
5 transcription proceeds in a 5' to 3' fashion along Ae DNA strand. The promoter 
and enhancer elbments which direct transcription of a linked gene are generally 
located 5* or upstream of the coding region. However, enhancer elements can 
exert their effect even when located 3' of the promoter element and the coding 
region. Transcription termination and polyadenylation signals are located 3* or 

1 0 downstream of the coding region. 

As used herein, the term "an oUgonucleotide having a nucleotide 
sequence encoding a gene" means a nucleic acid sequence comprising the coding 
region of a gene or in other words the nucleic acid sequence which encodes a ' 
gene product. The coding region may be present in either a cDNA, genomic 

15 DNA or RNA form. When present in a DNA form, the oUgonucleotide may be 
single-stranded (i.e., the sense strand) or double-stranded. Suitable control 
elements such as enhancers/promoters, splice junctions, polyadenylation signals, 
etc. may be placed in close proximity to the coding region of the gene if needed 
to permit proper initiation of transcription and/or correct processing of the 

20 primary RNA transcript. Alternatively, the coding region utiUzed in the vectors 
of the present invention may contain endogenous enhancers/promoters, splice 
junctions, intervening sequences, polyadenylation signals, etc. or a combination 
of both endogenous and exogenous control elements. 

As used herein, the term "regulatory element" refers to a genetic element 

25 which controls some aspect of the expression of nucleic acid sequences. For 
example, a promoter is a regulatory element which fecilitates the initiation of 
transcription of an operably linked coding region. Other regulatory elements 
mclude splicing signals, polyadenylation signals, termination signals and the 
like. 

30 Transcriptional control signals in eukaryotes comprise "promoter" and 

"enhancer" elements. Promoters and enhancers consist of short arrays of DNA 
sequences that interact specifically with cellular proteins involved in 
transcription (Maniatis et al.. Science. 236:1237 (1987)). Promoter and enhancer 
elements have been isolated from a variety of eukaryotic sources including genes 
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in yeast, insect and mammalian cells and viruses (analogous control elements, 
i.e., promoters, are also found in prokaryotes). The selection of a particular 
promoter and enhancer depends on what cell type is to be used to express ttie 
protein of interest. Some eukaryotic promoters and enhancers have a broad host 

5 range while others are functional in a limited subset of cell types (for review see 
Voss et al., Trends Biochem. Sci.. 11:287 (1986) and Maniatis et al., supra 
(1987)). For example, the SV40 early gene enhancer is very active in a wide 
variety of cell types from many mammalian species and has been widely used 
for the expression of proteins in mammalian cells (Dijkema et al., EMBO L, 

10 4:761 (1985)). Two other examples of promoter/enhancer elements active in a 
broad range of niammalian cell types are those from the human elongation factor 
10 gene (Uetsuki et al., J. Biol. Chem,> 264.5791 (1989), Kim et al.. Gene, 
91 :217 (1990) and Mizushima et al., Nuc. Acids. Res., 18:5322 (1990)) and the 
long terminal repeats of the Rous sarcoma virus (Gorman et al,, Proc. Natl 

15 Acad. Sci. USA. 79:6777 (1982)) and the human cytomegalovirus (Boshart et 
aUCeU, 41:521 (1985)). 

As used herein, the term "promoter/enhancer" denotes a segment of DNA 
which contains sequences capable of providing both promoter and enhancer 
functions (i.e., the functions provided by a promoter element and an enhancer 

20 element, see above for a discussion of these functions). For example, the long 
terminal repeats of retroviruses contain both promoter and enhancer functions. 
The enhancer/promoter may be "endogenous" or "exogenous" or "heterologous." 
An "endogenous" enhancer/promoter is one which is naturally linked with a 
given gene in the genome. An "exogenous" or "heterologous" enhancer/promoter 

25 is one which is placed in juxtaposition to a gene by means of genetic 

manipulation (i.e., molecular biological techniques) such that transcription of 
that gene is directed by the linked enhancer/promoter. 

The presence of "splicing signals" on an expression vector often results 
in higher levels of expression of the recombinant transcript. Splicing signals 

30 mediate the removal of introns from the primary RNA transcript and consist of a 
splice donor and acceptor site (Sambrook et al., Molecular Cloning: A 
Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York 
(1989) pp. 16.7-16,8). A commonly used splice donor and acceptor site is the 
sphce junction from the 1 6S RNA of S V40. 
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EflBcient expression of recombinant DNA sequences in eucaryotic cells 
requires expression of signals directing the efficient termination and 
polyadenylation of the resulting transcript Transcription termination signals are 
generally found downstream of the polyadenylation signal and are a few hundred 
5 nucleotides in length. The term "poly A site" or "poly A sequence" as used 
herein denotes a DNA sequence which directs both the termination and 
polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the 
recombinant transcript is desirable as transcripts lacking a poly A tail are 
unstable and are r^idly degraded. The poly A signal utilized in an expression 

10 vector may be "heterologous" or "endogenous." An endogenous poly A signal is 
one that is found naturally at the 3' end of the coding region of a given gene in 
the genome. A heterologous poly A signal is one which is one which is isolated 
from one gene and placed 3' of another gene. 

Bukaryotic expression vectors may also contain "viral replicons" or "viral 

15 origins of replication." Viral replicons are viral DNA sequences which allow for 
the extrachromosomal replication of a vector in a host cell expressing the 
appropriate replication factors. Vectors which contain either the SV40 or 
polyoma vims origin of replication replicate to high copy number (up to 10"^ 
copies/cell) in cells that express the appropriate viral T antigen. Vectors which 

20 contain the replicons fi-om bovine p^illomavirus or Epstein-Barr virus replicate 
extrachromosomally at low copy number (about 100 copies/cell). 

As used herein, the terms "nucleic acid molecule encoding," "DNA 
sequence encoding," and "DNA encoding" refer to the order or sequence of 
deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of tiiese 

25 deoxyribonucleotides determines the order of amino acids along the polypeptide 
(protein) chain. The DNA sequence thus codes for the amino acid sequence. 

As used herem, the term "gene" means the deoxyribonucleotide 
sequences comprising the coding region of a grae, e.g., a structural gene, and the 
including sequences located adjacent to the coding region on both the 5' and 3' 

30 ends for a distance of about 1 kb on either end such that the gene corresponds to 
the length of the full-length mKNA. The sequences which are located 5' of the 
coding region and which are present on. the mRNA are referred to as 5' non- 
translated sequences. The sequences which are located 3' or downstream of the 
coding region and which are present on the mRNA are referred to as 3' non- 
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translated sequences; these sequences. The term "gene" encompasses both 
cDNA and genonaic forms of a geae. A genomic form or clone of a gem 
contains the coding region intermpted with non-coding sequences termed 
"introns" or "intervening regions" or "intervening sequences." Introns are 
5 segments of a gene which are transcribed into nuclear KNA (hnKNA); introns 
may contain regulatory elements such as enhancers. Introns are removed or 
"spliced out" from the nuclear or primary transcript; introns therefore are absent 
in the messenger RNA (mRNA) transcript. The mRNA functions during 
translation to specify the sequence or order of amino acids in a nascent 
10 polypeptide. 

In addition to containing introns, genomic forms of a gene may also 
include sequences located on both the 5* and 3' end of ttie sequences which are 
present on the RNA transcript. These sequences are referred to as "flanking" 
sequences or regions (these flanking sequences are located 5' or 3' to the non- 
15 translated sequences present on the mRNA transcript). The 5' flanking region 
may contain regulatory sequences such as promoters and enhancers which 
control or influence the transcription of the gene. The 3' flanking region may 
contain sequences which direct the termination of transcription, post- 
transcriptional cleavage and polyadenylation. 
20 As used herein, the term "purified" or "to purify" refers to the removal of 

contaminants from a sample. 

The term "recombinant DNA molecule" as used herein refers to a DNA 
molecule which is comprised of segments of DNA joined together by means of 
molecular biological techniques. 
25 The term "recombinant protein" or "recombinant polypeptide" as used 

herein refers to a protein molecule which is expressed from a recombinant DNA 
molecule. 

The term "native protein" as used herein to indicate that a protein does 
not contain amino acid residues encoded by vector sequences; that is the native 
30 protein contains only those amino acids found in the protein as it occurs in 
nature. A native protein may be produced by recombinant means or may be 
isolated from a naturally occurring source. 

As used herein the term "portion" when in reference to a protein (as in "a 
portion of a given protein") refers to fragments of that protein. The fragments 
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may range in size fiom two or more amino acid residues to the entire amino acid 
sequence minus one amino acid. 

As used herein, the tam "fixsion protein", refers to a chimeric protein 
containing the protein of interest joined to a different peptide or protein 
5 ftagment. The fiision partner may, for example, enhance the solubility of a 

linked protein of interest, may provide an epitope tag or affinity domain to aUow 
identification and/or purification of the recombinant fiision protein, e.g., firam a 
host cell which expresses the fiision or a culture supernatant of that cell, or both, 
or may have another property or activity, e.g., two fimctional enzymes can be 
10 fused to produce a single protein witii multiple enzymatic activities. If desired, 
the fiision protein may be removed fiom the protein of interest by a variety of 
enzymatic or chemical means known to the art. Thus, examples of fiision 
protein producing sequeuces usefiil in the vectors of the invention inchide 
epitope tag encoding sequences, affinity domam encoding sequences, or other 
15 fimctional protein encoding sequences, and the like. The use of the tram 

"fimctional protein encoding sequence", as used herein, mdicates that the fiision 
protein producing element of a vector encodes a protein or peptide having a 
particular activity, such as an enzymatic activity, e.g., luciferase or 
dehalogenase, a binding activity, and the like, e.g., thioredoxin. For example, a 
20 fimctional protein encoding sequence may encode a kinase catalytic domain 

(Hanks and Hunter, FASEB J. 9:576-595, 1995), producing a fiision protein that 
can enzymatically add phosphate moieties to particular amino acids, or may 
encode a Src Homology 2 (SH2) domain (Sadowski, et al., Mol. Cell. Bio., 
g:4396, 1986i Mayer and Baltimore, Trends Cell. Biol.. 3:8, 1993), producing a 
25 fiision protein that specifically binds to phosphorylated tyrosines. 

L ttftrfrii^tinn Bn Trvme Sites and Knzvmes Usefii l in the Vector and Methods of 
the Invention 

The present invention employs two gaieral ^proaches to directional 
cloning and ordered goae assembly hi one approach, restiiction sites for 
30 h^axotetministic restriction enzymes, e.g., those with degenerate recognition or 
cleavage sequences (see Figures 1-2), are employed. Hapaxoterministic 
enzymes are enzymes able to generate unique ends (Table 1). Fold, a type nS 
enzyme, is included and so is Alwm, an interrupted palindrome. Because the 
cleavage site is located among tiie unspecified bases, tiie tomini are expressed in 
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NTs, Unless the complete nucleotide sequence within the interruption or flanking 
the recognition site is written, the detailed nature of the ends cannot be stated; 
statistically speaking, all single stranded overhangs will be different. It is also 
unlikely that these overhangs possess elements of symmetry. In the general case, 
5 this means that the protruding bases are not composed of an asymmetric unit 
followed by its reverse complement; the ends will not be self-complementary; 
and it wiU not be possible to form concatamers with a firagment bearing such 
ends. With nonhapaxoterministic razymes such as EcoBJ the opposite situation 
prevails; both the recognition site, Gj AATTC, and the overfianging ends 
10 produced by cleavage, AATT, always display palindrome-like elements; and the 
overhang of any fragment is complementary with itself and with the protruding 
ends of all other fragments generated by the same enzyme. 
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Table 1 



Atw 
NI 
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CAGNNNCTG 
GTCNNNGAC 
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Dra 
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CACNNNGTG 
GTGNNNCAC 

T . 


Bbsl 


l 

GAAGACNN (SEQ ID N0:3) 
CTTCTGNNNNNN 

t 


Earl 


I 

CTCTTCN (SEQ ED NO:4) 
GAGAAGNNNN 

t 


Bbvl 


i 

4 

GCAGCNNNNNNNN 

(XrTCG^I^^sl^^^NNNNN^lN 

CSE0IDN0:5) T 


EspSl 


1 

■I- 

CGTCTCN (SEQ ID N0:6) 
GCAGAGNNNNN 
t 


Bgll 


GCQ^TNNNNGGC . 
CGGNNNNNCCG (SEQ ID 
N0:7) 

t 


rOKi 


i 

r^ri A nr/^XTXTXTXTXTKTXTKTKI 
IjOA 1 VjJNlNlNJNrNrNXNlNiN 

CCTAOWNNNNNNNNl^NN 
(SEQIDNO:8) 

t 


Bsal 


GGTCTCN 

CCAGAGNNNNN (SEQ ID 

N0:9) , 

t 


tigav 


CTGCGNNNNNNNNNN (SEQ 
ID NO: 10) 


Bsll 


i 

CCMsNNNNNGG 
GGNNNNNNNCC (SEQ ID 
NO: 11) 
T 


Mwo I 


i 

UCJNiSJNJNXNJNJNvjC 

CGNNNNNNNCG (SEQ ID 
NO: 12) 
t 


Bsm 
AI 


i 

GTCTCN 

CAGAONNWNN (oJiQ lU 
N0:13) 

t 


pflm 


i 

CCANNNNNTGG 

N0:14) 
t 


Bsm 
FI 


i 

GTccc^l^l^I^l^^sI^l^^SIN 

^ A /"'/^i^xTxnwTKTVTVTVTKTVTKTKTVTKTVT 

CAGuONWWNWWWWJNJNJNrilJNJN 
(SEQIDNO:15) 

t 


Sap I 


i 

GCTCTTCN 
NO: 16) 

T 


Bsp 

IV/fT 


i 

rVw V/ J. vJii iN 1^ 1^ 

TGGACGNNNNNNNN (SEQ 
IDN0:17) 

t 


SfaT<(l 


i 

GCATCNNNNN 
CGTAGNNNNNNNNN (SEQ 
IDNO:18) 

T 


Bst 
XL 


i 

CCANNNNNNTGG 
GGTNNNNNNACC (SEQ ID 
NO: 19) 

T 


Sfll 


4 

GGCCNNNNNGGCC 
CCGGNNNNNCCGG (SEQ ID 
NO:20) 

t 



Note. The cleavage sites are indicated by the arrows. Isoschizomers 
occur in several cases. The enzymes listed and their isoschizomers are as 
5 follows: Bbsl Bs(^ll; Bbvl, Bstl, Bst7\J; Bsal, Eco3lI, BsmAI, Alw261; Earl, 
Ksp632l; and Pflm, AccBlL 
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Enzymes which generate blunt ends can never be hapaxoterministic. For 
instance, the restriction site for BsaBl has N's but the enzyme produces blunt 
end. 

( 

5 There are enzymes that are formally, but not functionally, hapaxomers. In 

this category are restriction radonucleases that generate overhangs of only one 
or two unspecified bases such as Alwl and Bpmlj respectively (Table 2). 
Conversely, those type n enzymes which recognize sites with multiple 

i 

degeneracies are functionally, but not formally, hapaxomers. For example, if a 
10 fitigment were to be cut at several locations by BsplliSI (Table 2), an array of 
single stranded extensions, e.g„ GGCC, TGCA, AGCT, GGCA, GGCT, AGCC, 
AGCA, TGCC, and TGCT, might occur. The first three of these possess an 
obvious element of symmetry which eliminates them from consideration. The 
last six protrusions do not possess an element of symmetry and, therefore, are 
1 5 neither self-complementary nor self-ligatable; they have the potential to be 

unique. On that basis Bspl286l is a hapaxomer. Hapaxotenninicity is the ability 
to generate a finite percentage of overhangs lacking in symmetry. The symmetry 
or lack thereof of the restriction enzyme recognition site is of no consequence. 
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Table 2 



Ahvl 
Bpml 

£5^12861 



i 



1 

GGATCNNNN 

(XTAGNNNNN (SEQ ID N0:21) 
t 

CTGGAGNNN^m^n«J^INNN^^^^^ 
GACCTC^I^I^^^N^^sINNN^l^^ 

(SEQ1DN0:22) 

t 

G Ci 
GAGCAC 
T T 
C G 
CTCGTG 
TA A 



Ah^axomerwith 
aa overhang of 
one base 

Ah^axomorwith 
an overhang of 
two bases 



An honorary 
h^axomer 



5,^712861 has ovwhangs of four bases on each strand; two bases are 
' 5 uniquely -specified and two are restricted to one of three possibilities. Clearly, 
the statistical probabiUty that the ends are unique is less than that of enzymes 
which generate two completely unq)ecified overhanging bases. Such enzymes 
include BcgU Bpml, BsaJl BsgU BsrDI, DrdU and EcoSTL 

In one embodiment of the invention, a donor vector; is obtained or 
10 prepared. The donor vector includes a DNAsequerjce of interest flanked by at 
least two restriction enzyme sites, at least one of which is for a first restriction 
enzyme wifli a degenerate recognition sequence. In another embodiment, the 
DNA sequence of interest is flanked by two restriction enzyme sites for a 
restriction enzyme with a degenerate recognition sequence, which sites are not 
1 5 identical and so, once the donor vector is cleaved with that enzyme, yields a 
linear DNA with non-self complementary single-strand DNA overhangs. The 
donor vector also contains at least one selectable marker gene which optionally 
is not the DNA sequence of interest, e.g., the selectable marker gene is part of 
the vector backbone. The donor vector is usefiil to transfer the DNA sequence of 
20 interest in an oriented manner to an accq)tor vector for expression of the DNA 
sequence of interest in the resulting recipient vector. The acceptor vector 
contains non-essential DNA sequences flanked by at least two restriction 
enzyme sites for a second restriction enzyme with a degenerate recognition 
sequence which yields non-self complementary single-strand DNA ovahangs. 
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Those sites, once cleaved, yield single-strand DNA oveihangs that are each 
complementary to only one of the two single-strand DNA overhangs generated 
by the first restriction enzyme. In one embodiment, the first and second 
restriction ^izyixies are the same. In another embodiment, the jGrst and second 

5 restriction enzymes are different and are not isoschizomers and so, the resulting 
ligated sequences (the exchange site) are not cleavable by at least one of 
restriction enzymes having a degenerate recognition sequence that is employed 
to transfer the DNA sequence of interest. For example, the fusion of single- 
strand DNA overhangs generated by Bgll and single-strand DNA overhangs 

10 generated by Sfil results in an exchange site that is not cleavable by Sfil, but is 
cleavable by Bgll. Similarly, the fusion of single-strand DNA overhangs 
generated by Sgfl and single-strand DNA overhangs generated by Pvul results in 
an exchange site that is not cleavable by Sgfl, but is cleavable by Pvul. Further, 
the fusion of ends generated by Pmel and ends generated by Dral results in an 

1 5 exchange site that is not cleavable by Pmely but is cleavable by DraL 

In another approach, a donor vector is obtained or prepared that contains 
a DNA sequence of interest flanked by at least two restriction enzyme sites, one 
of which is for a first restriction enzyme which has inJfrequent restriction sites in 
cDNAs or open reading frames firom at least one species and generates single- 

20 strand DNA overhangs, and another of which is for a second restriction enzyme 
that has infirequent restriction sites in cDNAs or open reading firames fix>m at 
least one species and generates ends that are not complementary to the 
overhangs generated by the first restriction enzyme. In one embodiment, the 
second restriction enzyme generates blunt ends. The donor vector also contains 

25 at least one selectable marker gene which optionally is not the DNA sequence of 
interest. The donor vector is useful to transfer the DNA sequence of interest in 
an oriented manner to an acceptor vector for expression of the DNA sequence of 
interest, resulting in a recipient vector. The acceptor vector contains non- 
essential DNA sequences flanked by at least two restriction enzyme sites. In one 

30 embodiment, the non-essential DNA sequences comprise a counter-selectable 
gene, e.g., a bamase gene, a ccdB gene, or a SacB gene. One of the flanking 
restriction sites in the acceptor vector is for a third restriction enzyme which 
generates single-strand DNA overhangs, which overhangs are complementary to 
the single-strand DNA overhangs produced by digestion of the donor vector with 
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the first restriction enzyme. In one embodiment, the restriction site for the third 
restriction enzyme is different than the restriction site for the first restriction 
enzyme and the sites are not cleaved by the same restriction enzyme. In another 
embodiment, tiie first and third restriction enzymes are the same. The other 
5 flanking restriction site in the acceptor vector is for a fourth restriction enzyme 
which yields ends that are not complementary to the ends generated by the first 
or tiiird restriction enzyme. In one embodiment, the second and fourth 
restriction enzymes generate blunt ends. In one embodiment, the restriction site 
for the fourth restriction enzyme is different than the restriction site for the 

1 0 second restriction enzyme and the sites are not cleaved by the same restriction 
enzyme. In this manner, the exchange site is likely not cleavable by the second 
or fourth restriction enzyme. In another embodiment, the second and fourth 
restriction enzymes are the same. 

Thus, by designing a donor vector and an acceptor vector with selected 

1 5 restriction enzyme sites which are appropriately positioned, once these vectors 
are digested with the respective restriction enzymes, the DNA sequence of 
interest can only be oriented in one direction in the acceptor vector backbone. 

Restriction enzyme sites useful in the practice of the invention include 
but are not limited to hapaxomeric sequences, sequences recognized by class H 

20 enzymes or class nS enzymes, as well as restriction enzyme sites recognized by 
enzymes that yield blunt ends, and including enzymes that are infirequent cutters 
in one or more species. 

Smtable class IIS restriction enzymes include those enzymes that 
recognize a five-base contiguous sequence, mcluding but not limited to the 

25 following enzymes and their isoschizomers, which are indicated in parentheses: 
Alw261 (BsmAI), Alwl {Acim, Bitii), AsuSSl {Hphl), Bhvl (Bsmi), BceQ, 
BstFSl (BseGI, FokT), Paul, Hgal, MboH PleU SfaM, and TspRl; that recognize 
a six-base contiguous sequence including but not limited to the following 
enzymes and their isoschizomers: AcelO., Bbsl (BbvU, Bptl^ BpuPJ)^ BceB3l, 

30 5czVI, Bfil (5mrl), Bpml (Gsul), Bsal (£co31I), BseRl, Bsgl, BsmBl (EspBI), 
BsmFl BspMU BsrDl (BseSDT), Bsu6l {Eam\ 1041, EarU Ksp6^2^), EcoSlly 
Paul, Mmel, RleP^ TaqJl, and KAl 11 H. SapU and its isoschizomer Ffl/?K32I, 
which recognize a seven-base sequence, and SfiL^ which recognizes an eight-base 
sequence, also can be used. Further examples of useful enzymes include those 
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that recognize a four-base pair split sequence (e.g., Bse4I {BselA, MsiYU BsH), 
MwoX)^ and enzymes that recognize a six-base pair split sequence (e.g., AccB7l 
iEspl396l, Pfim, Van91T), Adel {DraTS), Ahdl (AspEl, Eaml 1051, EcKHKl, 
NruGJ), Alwm, ApaBI (BstAPT), Aspl {PflPI, TthlllJ), BglU BstXl, Drdl 
5 (DseDT), and EcdMl (XagI), Xcml), Additional suitable class IIS restriction 
enzymes are known to those of skill in the art (see, for example, Szybalski et al., 
100:13(1991)). 

There are other enzymes that are not class IIS enzymes, which produce 
non-palindromic ends. Examples of such enzymes include but are not limited to 

10 Aval (AmaSll, Bcol, BsoBl, £co88I), AvaU (EcoAll, BmelSI, HgiBl, SinT), Banl 
(^ccBlI, Bshm, Eco6Al), Bfinl {BstSVl, Sfcl), BpulOI, BsaMI (BscCl Bsml 
Mval269I), 55^12851 (BsaOl BsiEl, BstMCT), Bsell (BseNl, Bsrl CfrlOI), Bsil 
(BssSI, BsOBT), BsiZl {AspS9X CfrUl Sau96J), BsplllOl {Blpl, BpullOll, 
Ce/n), BstACl, BstDBl (Ddel), Cpol (Cspl, RsrH), Dsal (BstDSl), EcolAl 

15 {BanB, ^coT38I, FnOI, /ifezJII), EcoUQl {Styl, BssTU, EcoT\A\ ErhJ), Espl 
{BlpU Bpulim, 55/717201, Ce/n), ^TgiAI (^^zHKAI, ^/w21I, ^^pHI, BbvllJ), 
HmQ, PspP?I {Ppum, PspSTT), SanDl Sdul {BspmSl, Bmyl), Seel (BsaJl, 
BseDJ), Sfcl {Bjmly Bsi&¥\), and SmR. 

Ottier enzymes useful in the invention are those which have few 

20 recognition sites in DNA, e.g., cDNAs, of one or more organisms (an 

"infrequent cutter*'). To select restriction enzyme sites for this embodiment of 
the invention, analyses of sequences for a plurality of mRNAs, open reading 
frames and/or cDNAs from an organism are conducted, e.g., using computer 
software, to determine the relative frequency of those sites in that organism (see 

25 Figures 3-5). For example, Sapl has numerous recognition sites in human 
cDNAs, e.g., 38-43%, while the combination of Sgfl and Pmel, and 5/zI, have 
relatively few recognition sites in human cDNAs, for instance, 2 to 3%, and 13 
to 14%, respectively. Enzymes which may generate ends complementary to Sgfl 
include but are not limited to 5ce83I {BpuBT), BseMSL, BseRl, Bsgl, BspCNl, 

30 BsrDl (BseSDl BseMI), BstFSl (BseGI), Btsl, Drdl {Aasl, DseDT), EcH, EcoSll 
(Acul, 55pKT5I)i EcoSlMl, Gsul (Bprnl), Mmel Taqll, TspDTl, TspGWl, 
mi 1 m, BspKT6l (BstKTX), Pad, Pvul {AfallMl, Afal6Rl, BspCl, EagBl, 
ErhB9l, Mvrl Nblk Plel9l PsulSll, Rshl, XorE), and Sgfl (AsiSI). 
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Enzymes which generate blunt ends include but are not limited to AhaUl 
{Dral PauAJI, Srul), AM (Mltl), Ball (Msl, Mm31I, MmNI, Mscl, MsplOI), 
BfrBl BsaM (BstBAl MspYl, PsuAS), BsdBl {Bse%l, BseJU &A1365I, BsMl, 
BsfSSl, Maml), BsfBl (4ccBSI, BstDXm, BsdXm, Mbil), Btrl (BmgBT), 

5 CacSl iBstCST), Cdi% CviJl (CwTI), CviBI {HpyCHAV HpyFUm), EcoATUl 
{AfeX Am, AorSim, Fml), EcolZL {Egel, Ehel SfoT), EcolCBl {BpuAmi, 
Ecll36Il, £co53kI, Mxal), EcdBN (Ceql Eco32l, Hjal, HpyCl, NsiCT), 
EsaBCil, FnuDU {AccE, BceBl, Bepl, Bpt&SU Bshmei, BspSOI, Bspl23i, 
BsiPm, BstUl, Bsul532I, BM, Csp6SKVl CspYLVtyFaM, FauBE, Mvnl, ThaJ), 

10 F^AL, HaeU HaeYO. (BanAl, BecAE, BimlSJI, Bme36\l, BseQl, Bshl, BshFl, 
' Bsp21 II, BspBRI, BspKI, BspSl, BsuSa, BteU Clt\ DsaU, EsaBC4l, FrnDl, 
MchAO, MfoAl, NgdPU, NspUJ, Pali, Pde\33l, PflKl, Phol Plal, Sbvl, Sfal, 
Sual), Hinda (HinJCl, HincO), Hpal (BstE2359l BsiHPl^ KspM, SsrJ), HpySI 
(HpyBn), Lpnl {BmelAll), Mlyl (Schl), MsR {SmiMI), Mstl (AcclGl, Aosl, AvOL, 

15 Fdm, Fspl, Nsbl, Paml, P«/il4627I), Nael (Ccol, Pdil, SauBMKI, SauHPl, 

SauLVl, Saum, SauSl, Slulim, SspCT), JVZalV (AspNL, BscBl, BspU, PspmJ), 
Nrul (Bsp6il, MluBll, Sbol3l, Spol), NspBTl (MspAlI), Olil (Alel), PmaCl 
{Acvl, BbrPl, BcoM, Ecolll, PmU), Pmel (MssT), PshM (Boxl, BstVAI), Psil, 
PvuU (Bavl, BavM, BavBl, Bspl53Al, BspM39l, BspOAl, Cfr6l, Dmal, EcO, 

20 iVweRI, Paeim, Pm»14627II, Pvm84II, Ubal 53AI, UbaM39I), Rsal (AfdU 

HpyBl, PlaAS), Seal (Accl 131, ^^I, Dpal, EcolSSl, RflFE), Soil, Smal (CfrJAl, 
PaeBl, PspALI), SnaBl (BstSm, EcolOSX), Srfl, Sspl, SspDSl^ Stul {Aatl, 
AspMU EcolAll, Gda, Peel, PmeSSl, Sari, SniiOIfl, SseBl, SteT), Swal 
(Bs<RZ246I, BstSm, MspSm, SmiT), Xcal {BspimU Bss^Ai, BstWm, 

25 BsmSl,Bs(L\TJ),JlmnliAspl(m,BbvAi,MroySL,Pdml),za!iZral. 

In one embodiment, the restriction enzyme site in a vector of the 
invention is for a restriction enzyme that generates blimt ends and prefoably has 
relatively few recognition sites in a particular organism, e.g., Pmel (MssI), Nrul 
(B^eSl, MltiBll, Sbol3l, SpoJ), SnaBl (BstSNl, EeolOST), Srfl, and Swal 

30 (BsfilZ246I, BsiSm, MspSWI, SmO), as well as Hpal, HincTL, PshM, Oli I, 
Alul, AIW161, Baa, Dral, Dpnl, EcoRATTH, EcoRCRL, EeoRY, Fokl, HaeTH, 
HincO, Mbol, M^AU, Nael, SsdU PvuU, Seal, Smal, Sspl, Sttd, A>krI, 
EcdBC3l, Soil, HmcH, Dral, BsaBl, CacZl, HpyZl, Mlyl, PshAl, SspDSl, BfrBl, 
BsaAi, BsrBl, Btrl, CdO, CviJl, Cv/RI, Eco4im, EcolSl, EcolCBl, FrtuDTL, 
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FspPJ^, Hael Lpnl Mlyl, MsR, Mstl Nael NlaJV, Nrul, NspBU, PwaCI, 
PshM, Psily Srfl, StuI, Xcal, XmnI, Zral or an isoschizomer thereof. 
n. Methods to Identify Frequencies of Recognition Sites 

Figure 3 is a flowchart of a method 300 for performing a genetic analysis 

5 according to an embodiment of the invention. The method may be performed by 
one or more computer programs or modules made up of computer-executable 
instructions. Describing the method by reference to a flowchart enables one 
skilled in the art to develop such programs or modules including such 
instmctions to carry out the method on suitable computers (the processor or 

10 processors of the computer executing the instructions fix)m computer-readable 
media such as RAM, ROM, CD-ROM, DVD-ROM, hard-drives, floppy drives 
and other such media). The method illustrated in Figure 3 is inclusive of acts 
that may be taken by an operating environment executing an exemplary 
embodiment of the invention. 

15 A system executing the method begins by populating a database with 

genetic records obtained from a source database (block 302). Populating a 
database may be performed using some manual manipulations. In some 
embodiments, the genetic records comprise gene sequences having open readmg 
frames, e.g., from cDNAs, or a portion thereof. In some embodiments, the 

20 database is populated using genetic records that may be obtained from publicly 
available source databases. For example, in some embodiments human gmetic 
data may be obtained through the Internet using the URL (Uniform Resource 
Locator) "flp.ncbi.nih.gov/refseq/H_sapiens/mKNAJProt/hs_flia.^^^ or the 
URL mgc.nci.nih.gov/. Genetic data for baker's yeast may be obtained using the 

25 URL"genome- 

f^.stanford.edu/pub/yeast/data_download/sequence/genomic_sequence/orf_d^ 
" Genetic data for E. coli may be obtained from the URL 
'*www.genome.wisc.edu/sequencing/kl2.htm.'' Genetic data for C. elegans may 
be obtained using the URL 

30 "flp.wonnbase.org/pub/wonnbase/confirmed_genes_current.gz". Genetic data 
for Arabidopsis may be obtained using the URL 

'tairpub:tairpub@ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/file=A 
THl.cds." It should be noted that no embodiment of the invention is limited to 
any particular source for the genetic data, and that many publicly and privately 
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available sources may be utilized. In one embodiment, the genetic records 
represent at least 10% or more, e.g., 25%, 50% or more, of the open reading 
frames in the genome of a selected organism. 

The data format for the source data may be different firom the format 
5 desired for the genetic database. In some embodiments, the source data is 
converted to a common format for storage in the genetic database. 

A query is issued to search for a subset of records in the genetic database 
that have at least one recognition site for a predetermined restriction enzyme or 
for a set of predetermined restriction enzymes (block 304). In one embodiment, 

10 one or more predetermined restriction enzymes have a 6, 7 or 8 bp recognition 
site, e.g., a set may include a predetermined restriction enzyme with a 7 bp 
recognition site and another wifli a 8 bp recognition site. However, the present 
invention is not lindted to any particular number of restriction enzymes included 
in the set or to a particular number of bp in the recognition site for the one or set 

15 of predetermined restriction enzymes. The resulting subset of records may be 
stored in a temporary table, in a separate results table, or in a separate database. 

In some embodiments, the resulting subset of genetic records is filtered 
to exclude records that may lead to erroneous, skewed, or non-usefiil results 
(block 306) or include records with selected characteristics. For example, it has 

20 been found that very long sequences in excess of 21,000 bp, a size likely to 
represent one of the largest open reading firames, typically lead to erroneous, 
skewed or non-useful results. Other filtering characteristics may also be used 
and are within the scope of the present invention. Examples of such filtering 
characteristics include filtering for (to exclude or include) a certain GC content, 

25 the presence or absence of introns, specific amino acid compositions in tiie 
predicted translation product of the open reading firames, similarity to known 
genes in specific gene families, a particular isoelectric point of predicted protein 
products of the open reading firames, and/or predicted membrane spaiming 
proteins in the open reading firames. It should be noted fliat filtering may occur 

30 at any point in the method. For example, the records may be filtered prior to 
populating the graetic database, or as part of the query to create the subset of 
records at block 2204. 

Next, a set of one or more statistics may be obtained by issuing one or 
more queries on the subset of records having at least one restriction enzyme 
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recognition site (block 308). In some embodiments, the queries comprise patt^ 
matching queries. The pattern may be specified in any of a number of ways 
known in the art. For example, wildcard characters may be used to specify one 
or more positions in the pattern, or regular expressions may be used to specify 
5 the pattern. The present invention is not limited to any particular fomi for 
specifying a pattem. Additionally, the pattern may be submitted as part of a 
query to a database engine, or the pattem matching may be executed by a 
program such as a Visual Basic program on records obtained by a query. 

In some embodiments, the niunber of records having particular restriction 
10 enzyme recognition sites is determined and reported (block 310). Jn some 
embodiments, in order to be included in the statistics, each record contains 
recognition sites for all of a predetermined set of restriction enzymes in order to 
be analyzed. 

In alternative embodiments, the number of restriction enzyme target sites 

1 5 occurring in a record is determined and reported (block 312). In some of these 
alternative embodiments, the record contains recognition sites for all of a 
predetermined set of restriction enzymes in order to be analyzed. 

In further alternative embodiments, statistics regarding the bases at 
ambiguous positions recognized or cleaved by hapaxomeric restriction enzymes 

20 are determined and reported (block 3 14). The statistics are desirable for 
determining the distribution of bases in the ambiguous positions of those 
restriction enzymes. Two examples of such ambiguity are the presence of N*s in 
sites recognized or cleaved by Sfil and Sapl as illustrated in Figure 1. In these 
alternative embodiments, the idmtity of any ambiguous bases in the recognition 

25 site(s) or bases between the recognition site(s) and the actual cleavage site(s) of 
some or all of the predetermined restriction enzymes are determined and 
reported along with one or more statistics on the identity of these bases. 

Figures 4-5 provide the frequency for various restriction enzyme 
recognition sites in a variety of organisms determined by the method described 

30 herein. 
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in. Vectors of the Invention 

Donor or recipient vectors are used to transfer a DNA sequence of 
interest, e.g., one in a library, e.g., in a cDNA library, in another vector, e,g., an 
expression vector, or one obtained from an isolated fragment, e.g., a PGR 
5 fragment, which DNA sequence of interest is flanked by desirable restriction 
enzyme recognition sites, to another vector (an acceptor vector) to generate a 
recipient (expression) vectpr, e.g., one useful for expression of the DNA 
sequence of interest. The presence and position of desirable restriction enzyme 
recognition sites in the acceptor vector and those flanking the DNA sequence of 

1 0 interest permits the rapid subcloning or insertion of the DNA sequence of 
interest into the acceptor vector in an oriented maimer. 

The acceptor vector may inchide sequences S* and/or y to the desurable 
restriction enzyme recognition sites which encode a peptide or polypeptide 
(fusion partner), which sequences, when operably linked to the DNA sequence 

IS of int^est and expressed in a cell, cell lysate or in vitro transcription/translation 
system, yield a fusion protein. Such a peptide or polypeptide may be located at 
either the N- or C-terminus of the fusion protein. Alternatively, the fusion 
protein may contain a peptide or polypeptide at both the N- and C-terminus, and 
each peptide or polypeptide may be different. Alternatively, the DNA sequence 

20 of interest may itself encode a fusion protein and, once combined with the 
acceptor vector, result in a recipient vector which encodes a recombinant 
polypeptide which includes one or more additional residues at the N-terminus, 
C-terminus, or both the N- and C- termini, which residues are encoded by 
sequences in the acceptor vector, e.g., those encoded by sequences 5' and/or 3' to 

25 the desirable restriction enzyme recognition sites. Moreover, one or more amino 
acid residues may be encoded by the exchange sites generated by the ligation of 
the ends of the DNA sequence of interest and the acceptor vector. 

In one embodiment, the peptide or polypeptide fusion partner is an 
epitope tag, affinity domain, e.g., a protease recognition site, or enzyme, e.g., 

30 thioredoxin or dehalogenase. An epitope tag is a short peptide sequence that is 
recognized by epitope specific antibodies. A fusion protein comprising an 
epitope tag can be simply and easily purified using an antibody bound to a 
chromatography resin. The presence of the epitope tag further allows the 
recombinant protein to be detected in subsequent assays, such as Western blots, 



wo 2005/087932 PCT/US2004/031912 

wittiout having to produce an antibody specific for the recombinant protein 
itself. Exaiiq>les of commonly used epitope tags include V5, glutathione-S- 
transferase (GST), hemaglutinin (HA), FLAG, c-myc, RYIRS, cahnodulin 
binding domain, the peptide Phe-His-His-Thr-Thr, chitin binding domain, and 
5 the like. 

Affinity domains are generally peptide sequences that can interact with a 
binding partner, such as one immobilized on a solid support, DNA sequences 
encoding metal ion affinity sequences, such as those wifli multiple consecutive 
single anraio acids, e.g., histidine, when fused to tiie expressed proteiu, may be 

10 used for one-step purification of the recombinant protein by high affinity binding 
to a resin column, such as nickel sepharose. An endopeptidase recognition 
sequence can be engineered between the polyamino acid tag and the protein of 
interest to allow subsequent removal of the leader peptide by digestion with 
enterokinase, and other proteases. Sequences encoding peptides or proteins, 

15 such as the chitin binding domain (which binds to chitin), GST (which binds to 
glutathione), biotin (which binds to avidin and strepavidin), maltose binding 
protein (MBP), a portion of staphylococcal protein A (SPA), a polyhistidine tract 
(HISn), and the hke, can also be used for facilitating purification of the protein of 
interest. The affinity domain can be separated fi-om the protein of interest by 

20 methods well known in the art, including the use of inteins (protein self-splicing 
elements, Chong et al., Gene, 192:271 (1997). In one embodiment, sequences 
for more than one fusion partner can be linked to sequences for a peptide or 
polypeptide of interest, e.g., an affinity domain is linked to a protease cleavage 
recognition site which is linked to a polypeptide of interest. 

25 To prepare expression vectors intended to generate defined fusions at the 

5* end of an open reading fi-ame (e.g., the acceptor vector does not contain 
sequences 5' of the exchange site that encode a peptide or protein for fusion), a 
desired restriction enzyme recognition site is placed at the desired start of 
transcription in the vector. Care is taken to avoid introducing an ATG or start 

30 codon upstream of the exchange site that might initiate translation 

inappropriately. For instance, fusion of an overhang generated by Sgfi digestion 
of an acceptor vector with a compatible overhang which is 5* to a start codon for 
an open reading firame in a DNA fragment can yield a recombinant vector 
containing a de novo start site for that open reading firame. Sequences fi:om the 
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acceptor vector which are present in the recombinant vector include sequences 5' 
to the overhang generated by Sgfl digestion, which optionally include a suitably 
positioned RBS. Optionally, sequences at the 5' end of the open reading frame 
include a Kozak sequence or a portion thereof which, when present in mRNA, is 
5 Citable of binding the small subunit of a eukaryotic ribosome. 

To prepare expression vectors intended to generate a fusion protein by 
fusmg a vector encoded peptide or protein located at the N-terminus of a fusion 
protein to a DNA sequence of interest (i.e., a translational fusion), the restriction 
enzyme recognition site is positioned in the correct reading frame such that I) an 
1 0 open reading frame is maintained through the restriction enzyme recognition site 
on file acceptor vector and 2) flie reading frame in the restriction enzyme 
recognition site on the acceptor vector is in frame witiii the reading frame found 
on the restriction enzyme recognition site contained within the donor vector. In 
addition, the appropriate restriction ^izyme recognitian site on the acceptor 
15 vector is designed to avoid the introduction of in-frame stop codons. The DNA 
sequence of interest contained within the donor vector is thus cloned in a 
particular reading fixime in the acceptor vector so as to facilitate the creation of 
the desired N-terminal fusion protein. For example, fusion of Sgfl sites at the 5' 
end of a DNA sequence of interest and 3 ' end of the acceptor vector can provide 
20 read through sequences. 

Similarly, to prepare expression vectors intended to generate a fusion 
protein by fusing a vector encoded peptide or protein located at the C-terminus 
of a fusion protein and a DNA sequence of interest, the restriction enzyme 
recognition site is positioned in the correct reading frame such that 1) an open 
25 reading frame is maintained through the restriction enzyme recognition site on 
the acceptor vector and 2) the reading frame in the restriction enzyme 
recognition site on the acceptor vector is in frame with the reading frame found 
on the restriction enzyme recognition site contained within the donor vector, i.e., 
a site which flanks the DNA sequence of interest at the 3* end. The DNA 
30 sequence of interest contained within the donor vector can thus be cloned in a 
particular reading fi:ame so as to facilitate the creation of the desired C-tenninal 
fijsion protein. For instance, fusion of a Pmel site with a EcdRN or Ball site can 
yield a C-terminal fusion with at least 2 amino acids added at the C-terminus, 
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while fusion of two Pmel sites or a Pmel site and a Dral site can yield a 
terminal fusion with a single amino acid added at the C-tenninus. 

In one embodiment, the expression vector encodes a protein with 
multiple fusion partners, e.g., an affinity tag for pnrification and a protease 
5 cleavage site fiised to a protein of interest. 

Use of the cloning system herein makes it possible to bring the protein 
sequence to be expressed in close proximity to the N-terminal and/or C-terminal 
fusion partner. A particular advantage is that it is possible to select the reading 
frame. This makes it possible not only to exactly position the DNA sequence of 

10 interest but also to define the ends of the fusion gene. 

The vectors employed in the practice of the invention also contain one or 
more nucleic acid sequences that generally have some function in the replication, 
maintenance or integrity of the vector, e.g., origins of replication, as well as one 
or more selectable marker genes. Replication origins are unique DNA segments 

15 that contain multiple short repeated sequences that are recognized by multimeric 
origin-binding proteins and which play a key role in assembling DNA replication 
enzymes at the origin site. Suitable origins of replication for use in expression 
vectors employed herein include E. coli oriC, colEl plasipid origin, 2^ and ARS 
(both useful in yeast systems), sfl, SV40 EBV oriP (useful in mammalian 

20 ' systems), pi 5 or tiiose found in pSClOl and &e like. 

Selection marker sequences are valuable elements in vectors as ttiey 
provide a means to select for or against growth of cells which have been 
successfully transformed with a vector containing the selection marker sequence 
and express the marker. Such markers are generally of two types: drug resistance 

25 and auxotrophic. A drug resistance marker enables cells to detoxify an 
exogenously added drug that would otherwise kill the cell. An auxotrophic 
marker allows cells to synthesize an essential component (usually an amino acid) 
while grown in media which lacks that essential component. 

A wide variety of selectable marker genes are available (see, for 

30 example, Kaufman, Meth. EnzvmoL, 185:487 (1990); Kaufinan, Meth. 

EnzvmoL, 185:537 (1990); Srivastava and Schlessinger, Gene, 103:53 (1991); 
Romanes et al., in DNA Cloning 2: Expression Systems, 2.sup.nd Edition, pages 
123-167 (IRL Press 1995); Markie, Methods Mol. Biol.. 54:359 (1996); Pfeifer 
et al. Gene. 188:183 (1997); Tucker and Burke, Gene, 199:25 (1997); Hashida- 
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Okado et al., FEBS Letters. 425:1 17 (1998))» Common selectable marker gene 
sequences include those for resistance to antibiotics such as ampiciUin, 
tetracycline, kanamycin, bleomycin, streptomycin, hygromycin, neomycin, 
Zeocin™, and the like. Selectable auxotrophic gene sequences include, for 
5 example, hisD, which allows growth in histidine free media in the presence of 
histidinoL 

Suitable selectable marker genes include a bleomycin-resistance gene, a 
metallothionein gene, a hygromycin B-phosphotransferase gene, the AURI gene, 
an adenosine deaminase gene, an aminoglycoside phosphotransferase gene, a 

1 0 dihydrofolate reductase gene, a thymidine kinase gene, a xanthine-guanine 
phosphoribosyltransferase gene, and the like. 

An alternate approach is to use a selectable marker gene that racodes a 
mutated enzyme that is less active than the corresponding wild-type enzyme. As 
an illustration, Munir et al.. Protein Eng.> 7:83 (1994), describe the design of 

1 S mutant fliymidine kinase enzymes with decreased activity (also see Liu and 
Summers, Virology, 163:638 (1988); Mendel et al., Antimicrob. Agents 
Chemother, 39:2120 (1995)). Low activity mutants have also been described for 
adenosine deaminase and dihydrofolate reductase (see, for example, Prendergast 
et al.. Biochemistry. 27:3664 (1988); Jiang et al., Hum. Mol Genet., 6:2271 

20 (1997); Ercikan-AbaU et al, Mol Pharmacol., 49:430 (1996)). 

Another type of marker gene is a gene that produces a readily detectable 
protein, such as green fluorescent protein, red fluorescent protein, an enzyme, 
(e.g., placental alkaline phosphatase, beta-galactosidase, beta-lactamase, or 
luciferase), or a cell surface protein that can be detected with an antibody (e.g. 

25 CD4, CD8, Class I major histocompatibility complex (MHC) protein, etc.). The 
expression products of such selectable marker genes can be used to sort 
transfected cells from imtransfected cells by such standard means, e.g., FACS 
sorting or magnetic bead separation technology. 

Metallothionein genes encode proteins that have a high afiSnity for toxic 

30 metals, such as cadmium, zinc, and copper (Beach and Pahniter, Proc. Natl 

Acad. Sci. USA, 78:21 10 (1981); Huang et al., Hffl. 52:439 (1987); Czaja et al., 
J. Cell. PhvsioL, 147:434 (1991)). Accordingly, metallothionein genes provide 
suitable titratable markers for tiie methods described herein. 
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In one embodiment, the acceptor vector includes a counterselectable gene 
flanked by desirable restriction enzyme sites. Prefenred genes in this regard 
include but are not limited to lethal genes, such as those which are inducible with 
low to no constitutive activity (and preferably with some immunity factor), e.g., 
S genes such as bar (barstar), those encoding a restriction enzyme (a gene 

encoding a con;esponding methylase), or those encoding nuclease colicins, e.g., 
E9 DNAse, and colicin RNases and tRNases, or gyrase A, as well as 
MazF(ChpAK), Doc (Phd), ParE, PasB, StbOrf2, HigB, 2, RelE, Txe, YeoB, 
SacB, KM, KorA, KorB, Kid (Kis), PemK (Peml), Hok {Sok\ Dec (Pno), CcdB 
10 (CcdA), F' plasmid, and the like. 

Other selection approaches include the use of regulated transcriptional 
modulators, e.g., a tertracycline inducible or repressible system (see, for 
instance, WO 96/01313). 

The acceptor vectors employed in the practice of the invention also 
15 contain one or more nucleic acid sequences that have some fimction in the 
expression of a protein, i.e., transcriptional regulatory sequences, for instance, 
inducible or repressible control sequences such as promoter or enhancer 
sequences. 

Promoter-enhancer sequences are DNA sequences to which RNA 
20 polymerase binds and initiates transcription. The promoter determines the 
polarity of the transcript by specifying which strand will be transcribed. 
Bacterial promoters consist of consensus sequences, -35 and -10 nucleotides 
relative to the transcriptional start, which are bound by a specific sigma factor 
and RKA polymerase. Eukaryotic promoters are more complex. Most promoters 
25 utUized in vectors are transcribed by RNA polymerase n. General transcription 
factors (GTFs) first bind ^ecific sequences near the start and then recruit the 
binding of KNA polymerase n. In addition to these minimal promoter elements, 
small sequence elements are recognized specifically by modular DNA- 
binding/trans-activating proteins (e.g., AP-1, SP-1) that regulate the activity of a 
30 given promoter. Viral promoters serve the same function as bacterial or 

eukaryotic promoters and ei&er provide a specific RNA polymerase in trans 
(bacteriophage T7) or recruit cellular factors and RNA polymerase (S V40, RS V, 
CMV). Viral promoters may be preferred as they are generally particularly 
strong promoters, 
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Promoters maybe, furthermore, either constitutive or regulatable (i.e., 
inducible or derepressible). Inducible elements are DNA sequence elements 
which act in conjunction with promoters and bind either repressors (e.g., 
lacO/LAC Iq repressor system in E. coli) or inducers (e.g., GaWGALA inducer 
5 system in yeast rAaBAD/rhamnose in coli). In either case, transcription is 
virtually "shut off' until the promoter is derepressed or induced, at which point 
transcription is "tumed-on". 

Examples of constitutive promoters include the int promoter of 
bacteriophage X, the bla promoter of the ^-lactamase gene sequence of pBR322, 

10 the CAT promoter of the chloramphenicol acetyl transferase gene sequence of 
pPR32S, and the like. Examples of inducible prokaryotic promoters include the 
major right and left promoters of bacteriophage (Pl ax^d Pr), the trp^ reca, lacZ, 
lac\ araC and gql promoters of E. coli, the a-amylase (Ulmanen et al., JL 
BacterioL. 162:176 (1985), the araBAD promoter, the rhaBAD promoter, and 

15 the sigma-28-specific promote of B, subtilis (Gitman et al.. Gene Sequence, 
32:1 1 (1984), the promoters of the bacteriophages of Bacillus (Gryczan, In: The 
Molecular Biology of the Bacilli, Academic Press, Inc., NY, 1982), 
Streptomyces promoters (Ward et at., Mol. Gen. Genet.. 203:468 (1986), Pichda 
promoters (U.S. Pat. Nos. 4,855,231 and 4,808,537), and the like. Exemplary 

20 prokaryotic promoters are reviewed by Glick (J. Ind. Microbiol., 1:277 (1987); 
Cenatiempo (Biochimie, 68:505 (1986); and Gottesman (Ann. Rev. Genet., 
18:415 (1984). In one embodiment, the promoter is a T7 promoter or a SP6 
promoter. 

Preferred eukaryotic promoters include, for example, the promoter of the 
25 mouse metallothionein I gene sequence (Hamer et al., J. Mol. Appl. Gen., 1:273 
(1982); the TK promoter of Heq)es virus (McKnight, Cell 31:355 (1982); the 
SV40 early promoter (Benoist et al. Nature (London\ 290:304 (1981); the yeast' 
Gall gene sequence promoter (Johnston et al., Proc. Natl. Acad. Sci. fUSA^ 
79:6971 (1982); Silver et al., Proc. Natl. Acad. Sci. (USA), 81:5951 (1984), a 
30 baculovirus promoter, the CMV promoter, the EF-1 promoter, Ecdysone- 
responsive promoter(s), tetracyclme-responsive promoter, and the like. 

Suitable prokaryotic vectors mclude plasmids such as those capable of 
replication in E. coli (for example, pBR322, ColEl, pSClOl, PACYC 184, itVX, 
pRSET, pBAD (Invitrogen, Carlsbad, Calif.), and the like). Such plasmids are 
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disclosed by Sambrook (cf. Molecular Cloning: A Laboratory Manual, second 
edition, edited by Sambrook, Fritsch, & Mamatis, Cold Spring Harbor 
Laboratory, 1989). Bacillus plasmids include pC194, pC221, pT127, and the 
like, and are disclosed by Gryczan (In: The Molecular Biology of the Bacilli, 
5 supra, pp. 307-329). Suitable Streptomyces plasmids include plJlOl (Kendall et 
al., J. BacterioL> 169:4177 (1987), and streptomyces bacteriophages such as 
.phi.C31 (Chater et al., In: Sbcth Intemational Symposium on Actinomycetales 
Biology, Akademiai Kaido, Budapest, Hungary, pp. 45-54, 1986). Pseudomonas 
plasmids are reviewed by John et al, (Rev. Infect. Dis» 8:693 (1986), and Izaki 
1 0 (Jpn. J. BacterioL. 33 :729 (1978). In one embodiment, the vector backbone for 
an acceptor vector for expression of linked sequences in E. coli includes an amp^ 
gene, T7 transcriptional regulatory elements, and sequences for producing a 
fusion protein such as a GST, thioredoxin or dehalogenase fusion with a protein 
of interest. 

1 5 Suitable eukatyotic plasmids include, for example, BPV, EB V, vaccinia, 

SV40, 2-micron circle, pCI-neo, pcDNA3.1, pcDNA3.1/GS, pYES2/GS, pMT, 
pIND, pIND(Spl), pVgRXR (Invitrogen), and the like, or their derivatives. Such 
plasmids are well known in the art (Botstein et al., Miami Wntr. Svmp., 19:265 
(1982); Broach, In: The Molecular Biology of the Yeast Saccharomyces: Life 

20 Cycle and Inheritance, Cold Spring, Harbor Laboratory, Cold Spring Harbor, 
KY. pp. 445-470, 1981; Broach, CeU, 28:203 (1982); Dilon et al., L Clin. 
Hematol. Oncol- 10:39 (1980); Maniatis, In: Cell Biology: A Comprehensive 
Treatise, Vol, 3, Gene Sequence Expression, Academic Press, NY, pp. 563-608, 
1980. In one embodiment, the vector backbone for an acceptor vector for 

25 expression of linked sequences in mammalian cells or an in vitro eukaryotic 
transcription/translation reaction is pCMVTnT (Promega Corp.), and sequences 
for producing a fusion protein such as a GST or dehalogenase fusion with a 
protein of interest. 

Promoters/plasmid combinations are employed with suitable host cells, 

30 e.g., prokaryotic cells, such as E. coli, Streptomyces, Pseudomonas and Bacillus, 
or eukaryotic cells, such as yeast, e.g., Picchia, Saccharomyces or 
SchizosaccharomyceSy insect cells, avian cells, plant cells, or mammalian cells, 
e.g., human, simian, parcine, ovine, rodent, bovine, equine, caprine, canine or 
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feline cells, as well as lysates thereof, e.g., TNT, wheat gem lysates or S30 
lysates. 

In one embodiment, the host cell is a recombinant cell, e.g., a 
recombinant prokaryotic cell. In one embodiment, the recombinant host cell is 

5 deficient in one or more genes in an inducible pathway, e.g., a sugar pathway 
such as the rhanmose catabolic pathway, and comprises a recombinant DNA ' 
comprising an inducible promoter for the one or more genes operably linked to 
an open reading fi^ame for a heterologous RNA polymerase. The recombmant 
host cell or a lysate there, or an in vitro traxiscription/translation mixture 

1 0 supplemented with the heterologous RNA polymerase, is contacted with a vector 
of the invention comprising a promoter for the heterologous RNA polymorase 
operably linked to a DNA sequence of interest In one embodiment, the 
recombinant host cell is a recombinant E. coli cell that is deficient in ifaamnose 
catabolism and comprises a rAoBAD promoter operably linked to a T7 RNA 

1 5 polymerase opert reading firame. In llie absence of rbamnose, such a cell has no 
or low levels of T7 RNA polymerase and so is particularly usefiil to clone toxic 
genes. 

In another embodiment, tiie recombinant host cell expresses an inmiunity 
factor for a gene product that is lethal to the cell. The immunity factor is 

20 preferably expressed firom a constitutive promoter. An expression vector 

encoding the lethal gene product may be introduced to the recombinant cell and 
the transformed cell propagated. In one embodiment the gene product is bamase 
which has been modified by deleting sequences for the secretory segment (signal 
peptide) and optionally adding a ATG in place of the last codon for the secretory 

25 sequence. 

IV. Use of DNA Binding Proteins to Protect Restriction Enzyme Sites 

In the process of introducing a DNA sequence of interest to a donor 
vector, or fix>m a donor vector to an acceptor vector, restriction enzyme sites 
which flank the DNA sequence of interest, i.e., those usefiil in cloning, may also 
30 be present in either the DNA sequence of interest or vector sequences. To 

protect sites containing a particular restriction enzyme site firom cleavage by the 
corresponding enzyme, DNA binding proteins and methylation may be 
employed. For instance, the process of protecting a restriction site with RecA 
(RecA cleavage and production) is more reproducible, provides better yields and 
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is less cumbersome than partial restriction digests. Other means of protecting a 
restriction site include using repressor proteins, eukaryotic transcription factors, 
Kcoli host integration factor or oligonucleotides capable of forming a triple 
helix structure, however, the specificity of protection using RecA is entirely 
5 from the synthetic single-stranded DNA. In the presence of a nonhydrolyzable 
ATP analog such as ATP[ganMna-S], the RecA protein nonspecifically binds to 
single-stranded DNA (ssDNA) (approximately one RecA monomer per three 
nucleotides) to form a structure called a presynaptic filament This RecA-coated 
oligonucleotide then anneals with homologous duplex DNA to form a stable 

10 triplex DNA-protein complex. The presyn^tic filament represents a usefiil 
molecular research tool in that: i) the sequence and length of the ssDNA added 
to the reaction determines the site and span of the presynaptic filament and ii) 
the presynaptic filament protects the DNA at the hybridization site 6om 
modification by DNA methylases and restriction enzymes. These features 

15 enable RecA protein-mediated DNA complexes to add a new level of specificity 
to molecular biology q)plications that require DNA cleavage at predetermined 
sites, such as genomic mapping and the subcloning of DNA fragments. 
Compared to PCR methods, the use of a DNA binding protein is quicker and 
does not introduce mutations arising from multiple cycles of in vitro 

20 amplification. 

The general protocols include protecting a restriction site from 
methylation, making it unique for restriction enzyme cleavage (RecA cleavage), 
and protecting a restriction site from digestion (RecA protection). The RecA 
cleavage protocol is based on the RecA Achilles' cleavage procedure of Koob et 

25 al. (Science, 241, 1 084 (1988)), Koob et al, (Gene. 74, 165 (1988)), and Koob et 
al. fNucle. Acids Res., 20, 583 1 (1992)). Additionally, RecA cleavage is usefid 
for generating restriction fragments for subcloning when the desired restriction 
site is repeated several times within the fragment. However, if only one or two 
restriction sites are repeated within the desired fragment, RecA protection is 

30 preferred. Based on fluorometric analysis ofthe RecA products after 

electrophoresis, these two protocols routinely resulted in 70% to 80% protection 
when a single site was protected. This technique also can be used for DNA 
embedded in agarose plugs. 
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Tables 



Oligonucleotides: Prepared by user 
to be specific for the intended 
protected site. Diluted to 160 ng/^l. 


Methylase: In theory, any 
restriction enzyme/methylase pair 
could be used. In these protocols, 
35 ^|il of J^coRI mefhylase was 
used. 


RecA: l-3mg/inl 


Restriction enzyme: In these 
nrotocols* 12 u/ul EcoSS. was used 


SAM: L6 mM S-adenosyl 
methionine. Prepared immediately 
before use firom a 32 mM stock by 
dilution with ice-cold 5 mM 
sulfuric acid. 


Buffer A: 250 mM Tris-acetate 
(pH7.5at25X),lmM 
magnesium acetate. 


ATP [gamma-S]: Aliquots of a 10 
mM solution (in water) are stored at 
-70°C. 


Buffer B: 1 66 mM Tris-acetate 
(pH7.5at25^C), 37 mM 
magnesium acetate, 100 mM DTT. 


80 mM magnesium acetate. 


250 mM potassium acetate. 


Restriction Enzyme Buffer H 
(Promega) 





A> RecA Cleavage or Protection Reactions 

The RecA Concentration 
5 To maximize the specificity and efficiency of RecA protection, it may be 

necessary to manipulate the oligonucleotide:RecA ratio: a concentration of 6.25 

Hg RecA in a 1 0 |il reaction works well. 

The Oligonucleotide Concentration 

The molar stoichiometry (in terms of moles of nucleotides to moles of 
10 RecA protein) of the binding of the oligonucleotide to RecA is 3:1. lii other 

words, one RecA protein binds every three nucleotides of any single-stranded 

DNA. This ratio is independent of oligonucleotide size and corresponds to 160 

ng of oligonucleotide per 6.25 ^g RecA. A titration series of 40-280 ng in 40 ng 

mcrements is useful to determine the optimal concentration of oligonucleotide to 
15 use with the RecA. If nonspecific protection is a problem, then 160 ng of 

oligo(dT) can be added to the reaction after the addition of ATP[gamma-S]. 

Design of Oligonucleotide 

An oligonucleotide of 30 to 36 bases in length is recommended for both 

RecA cleavage and RecA protection in solution. The protected site was located 
20 in the middle of the 30 base oligonucleotide used throughout the development of 

this protocol (see also RecA Cleavage and Protection for Genomic Manning and 

Subcloning. firom Promega Notes Magazine #50). 
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Buffer 

It may be necessary to adjust the salt concentration to improve the 
activity of the enzyme after methylation. Acetate salts ^pear to be less 
destabilizing to the RecA triplex than chloride salts, and thus potassium acetate 
5 rather than potassium or sodium chloride may be employed. 
Snhdoninj ^ the Products of RecA Cleavage 

Because the products of a RecA cleavage reaction are methylated, low 
transformation frequencies may arise from incompatibilities with the host^s 
restriction/modification system. If transformation efficiencies are low, compare 

10 the genotype of the host to the known methylation-induced restriction systems to 
detennine if this is the cause. 
IV. Exemplary Vector Systems 

In one embodiment, at least one of the restriction enzyme sites in the 
donor vector and/or flanking the DNA sequenpe of interest is for a restriction 

1 5 enzyme with a degenerate recognition sequence, e.g., Sfil is a restriction enzyme 
with a degenerate recognition sequence that recognizes an interrupted 
palindromic sequence (Figure 6). To employ restriction enzymes that recognize 
an interrupted palindromic sequence and generate single-strand DNA overhangs 
for use in directional cloning, at least two unique sites for that restriction enzyme 

20 and/or unique site(s) for a different restriction enzyme that generates non-self 
complementary single-strand DNA overhangs that are complementary with the 
overhangs generated by the first restriction enzyme are employed. Other 
methods may be used to enhance the frequency of desired vectors, e.g., the use 
of methylation, and/or selectable and counterselectable genes. 

25 Figure 7 shows a schematic of the use of donor and acceptor vectors 

having restriction enzyme sites for a restriction enzyme which recognizes an 
interrupted palindome (enzyme I; the unique sequences are indicated by A and 
B, theu: complements by A and B', respectively, and the palindromic sequences 
by boxes). The donor vector has a drug resistance gene 1 and a DNA sequence 

30 of interest (light grey box) flanked by one or more restriction enzyme sites for 
the restriction enzyme which recognizes an interrupted palindome. The acceptor 
vector has a different drug resistance gene (drug resistance gene 2) and, after 
digestion with a restriction enzyme with a degenerate recognition sequence, has 
non-self complementary single-strand DNA overhangs A and B* which are. 
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respectively, complementary with the non-self complementary single-strand 
DNA overhangs present after digestion of the donor vector with enzyme L Thus, 
after digestion of the donor vector with enzyme I and in the presence of the 
linearized acceptor vector and ligase, the linearized DNA sequence of interest is 
5 joined in an oriented manner to the acceptor vector, to yield a recipient vector. 
In Figure 7 A, one half site of the restriction site for enzyme I is present at each 
end of the DNA sequence of interest in the recipient vector. If the ligation 
regenerates the restriction site, then there is a competing back reaction (Figure 
7B). In Figure 7C, a counterselectable gene (a lethal gene) is employed in the 

10 acceptor vector so that cells with the recipient vector rather than the acceptor 
vector can be readily identified. 

Figure 8 shows one method by which a DNA sequence of interest is 
modified to contain restriction enzyme sites for a restriction enzyme with a 
degenerate recognition sequrace. Oligonucleotides having unique degenerate 

15 sequences for the restriction enzyme at the 5' end, and sequences complementary 
to one of the strands of the DNA sequence of interest at the 3' end, are employed 
in an amplificatipn reaction. Those xmique sequences are also present in a vector 
containing a drug resistance gene. The amplified fi-agment and the vector are 
digested with the restriction enzyme and ligase added to yield a donor vector of 

20 the invention. If the sites are recognized by restriction enzymes which are 

sensitive to the methylation state of DNA, e.g., at Dcm sites or using a methylase 
for 5^1, methylation may minimize the back reaction. The donor vector is then 
digested with a restriction enzyme(s) having degenerate recognition sequences 
and which releases the DNA sequence of interest, and mixed with an acceptor 

25 vector having complementary single-strand DNA overhangs generated by, for 
example, a different enzyme with a degenerate recognition sequence that 
generates non-self comjplementary single-strand DNA overhangs. 

Figures 9A-B show another approach to preparing a donor vector of the 
invention. A DNA sequence of interest is modified to contain restriction enzyme 

30 sites for a restriction enzyme with a degenerate recognition sequence. 

Oligonucleotides having unique degenerate sequences for the restriction enzyme 
at the 5' end, and sequences complementary to one of the strands of the DNA 
sequence of interest at the 3' end, are employed in an amplification reaction. The 
DNA sequCTce of interest may include internal sites for that restriction enzyme. 
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To protect those internal sites from digestion, they are methylated, while the 
flanking sites at the ends of the amplified fi'agment remain unmethylated and 
therefore sensitive to digestion. To accomplish this, oligonucleotides 
complementary to the sites which are to remain unmethylated and a DNA 
S binding protein such as RecA are added to the amplified fi:agment. The internal 
sites are then methylated with an appropriate methylase. A column may be 
en:q>loyed to remove the oligonucleotide-DNA binding protein complexes torn 
the amplified firagment. The sites which were added to the ends of the DNA 
sequence of interest, once digested, yield non-self complementary single-strand 

1 0 DNA overhangs. Complementary overhangs may be generated in a vector by 
digestion with a selected restriction enzyme with degenerate recognition sites, 
which enzyme may be different than the enzyme employed to digest the 
amplified firagment. The amplified firagment and the vector are then digested 
with the one or more restriction enzymes, and the resulting linear firagments 

15 ligated to form a donor vector containing a drug resistance gene and the DNA 
sequence of interest flanked by sites generated by the joining of the 
complementary single-strand DNA overhangs, which sites are recognized by one 
or more restriction enzymes with a degenerate recognition sequence, e.g., the 
enzyme employed to digest the amplified fi-agment. 

20 Figures lOA-B illustrate an approach to prepare a recipient vector of the 

inventioa In this embodiment, a donor vector comprises a drug resistance gene 
and a DNA sequence of interest flanked by restriction enzyme sites for an 
enzyme with a degenerate recognition sequence and containing one or more of 
those sites intemally. To protect fiiose internal sites fi*om digestion, they are 

25 methylated. To ensure that the flanking sites remain unmethylated and thus 
sensitive to digestion, oligonucleotides complementary to the sites which are to 
remam unmethylated and a DNA binding protein are added to a donor vector. 
The site(s) for the restriction enzyme which are not bound by the 
oligonucleotide/DNA bindmg protein is/are then methylated with m appropriate 

30 methylase. A column may be employed to remove the oligonucleotide-DNA 
binding protein complexes firom the donor vector. The donor vector is then 
added to an acceptor vector having at least two recognition sites for a restriction 
enzyme with a degenerate recognition sequence, which restriction enzyme 
produces non-self complementary single-strand DNA overhangs which are 
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complementary to the overhangs generated by digestion of the donor vector with 
a restriction enzyme that cleaves the unmethylated sites. The acceptor vector 
preferably comprises a drug resistance gene which is different than the drug 
resistance gene in the donor vector. Li one embodiment, the restriction enzyme 
5 used to digest the acceptor vector may be different than the restriction enzyme 
employed to digest the donor vector. Subsequent ligation of the linearized DNA 
fragments obtained by digestion of the donor and acceptor vectors yields a 
recipient vector. 

In one embodiment, the restriction enzyme used to linearize the donor 

10 vector and the acceptor vector are the same, for instance, the donor vector has 
unique Sfil sites flanking the DNA sequence of interest, which sites, once 
digested with Sfily yield non-self complranentary single-strand DNA overhangs 
that are complementary with the single-strand DNA overhangs generated after 
digestion of the acceptor vector with Sfil. In another embodiment, the donor 

1 S vector has unique BgtL sites flanking the DNA sequence of interest which sites, 
once digested with 5g/I, yield non-self complementary single-strand DNA 
overhangs that are complementary with the single-strand DNA overhangs 
generated after digestion of the acceptor vector with BgR. In another 
embodiment, the restriction enzyme with a degenerate recognition sequence used 

20 to hnearize the donor vector and the acceptor vector is different, for instance, the 
donor vector has unique i^I sites flanking the DNA sequence of interest which 
sites, once digested with Sfily yield non-self complementary single-strand DNA 
overhangs that are complementary with the single-strand DNA overhangs 
generated after digestion of the acceptor vector with BgH. Restriction enzymes 

25 useful with SJiL in preparing donor and acceptor vectors are shown in Figure 1 1. 
Methylases for »^I and/or BgH may be obtained by well-known methods, see, 
e.g., U.S. Patent Nos. 5,179,015, 5,200,333, and 5,320,957. For instance, the 
preparation of recombinant Bgll and its correspondmg methylase is disclosed in 
U.S. Patent No. 5,366,882. The preparation of recombinant Sfil and a 

30 corresponding methylase is provided m U.S. Patent No. 5,637,476. Other 
melhylases useful with vectors containing Sfil recognition sites include the 
methylase for HaeSH and Dcm methylase. 

In another embodiment, at least one of the restriction enzyme sites in the 
donor vector and/or flanking the DNA sequence of interest is a site for a type JSS 
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enzyme, e.g., Sapl. Figure 12 illustrates the preparation of a recipient vector of 
the invention from a donor vector and an acceptor vector using vectors with 
recognition sites for type IIS restriction enzymes. To employ sites for type IIS 
restriction enzymes in directional cloning, at least two unique sites for that 

5 restriction enzyme and/or imique site(s) for a different restriction enzyme that 
' generates non-self complementary single-strand DNA overhangs that are 
complementary with the overhangs generated by the first restriction enzyme are 
selected. Methylation may be employed to increase the frequency of desired 
vectors, as well as the use of selectable and counterselectable genes. 

10 In one embodiment, the restriction enzyme used to linearize the donor 

vector and the acceptor vector are the same, for instance, the donor vector has 
unique Sapl sites flanking the DNA sequence of interest, which sites, once 
digested with Saply yield non-self complementary single-strand DNA overhangs, 
that are complementary with the smgle-strand DNA overhangs generated after 

15 digestion of the acceptor vector with SapL In another embodiment, the donor 
vector has unique Earl sites flanking the DNA sequence of interest which sites, 
once digested with Earl, yield non-self complementary single-strand DNA. 
overhangs that are complementary with the single-strand DNA overhangs 
generated after digestion of the acceptor vector with EarL In another 

20 embodiment, the restriction enzyme used to linearize the donor vector and the 
acceptor vector is different, for instance, the donor vector has unique Sapl sites 
flanking the DNA sequence of interest, which sites, once digested with Sapl, 
yield non-self complementary single-strand DNA overhangs that are 
complementary with the single-strand DNA overhangs generated after digestion 

25 of the acceptor vector with Earl The preparation of Sapl and a corresponding 
methylase are disclosed U.S. Patent No. 5,663,067. 

In contrast to the use of Sfil vectors for directional cloning, which yields 
12 bases (3 potential codons) at the exchange sites, the use of Sapl vectors yields 
3 bases (1 potential codon) at the exchange sites. Thus, Sapl vectors are 

30 particularly useful in recipient vectors as the protein encoded by the DNA 
sequence of interest in the recipient vector may include only two additional 
residues, one at the N-termiuus and one at the C-terminus, e.g., a codon for 
methionine at the N-terminus and a residue at the C-terminus which is frequently 
found at or near the C-terminus of a plurality of proteins. Accordingly, proteins 
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expressed from Sapl vectors are very close in composition to their corresponding 
native protein. Moreover, the overlapping sequences which form the exchange 
site may be chosen to correspond to codons employed at a certain frequency in a 
particular organism. 

5 Jn another embodiment, shown in Figures 14-15, a two enzyme approach 

is used for directional cloning. For ttie donor vector, the DNA sequence of 
interest is flanked by at least two restriction enzymes sites. One of the sites is for 
a first restriction enzyme which is an infrequent cutter of cDNAs or open reading 
fi^es in at least one species and generates smgle-strand DNA overhangs while 

10 the othCT site is for a second restriction enzyme that is also an infrequent cutter 
of cDNAs or open reading frames in at least one species and generates ends that 
are not complementary to the ends generated by the first restriction enzyme. In 
one embodiment, the second restriction enzyme generates blunt ends. For 
instance, a donor vector has a drug resistance gene 1 and a DNA sequence of 

15 interest flanked by a restriction enzyme site for an enzyme (enzyme I) that is an 
infrequent cutter of human cDNAs or open reading finmes and generates a 
single-strand DNA overhang, e.g,, Sgfl, and by a site for a restr|[ction enzyme 
(enzyme IT) that in an infrequent cutter in that same species and generates blunt 
ends, e.g., PmeL The donor vector which, optionally, is an expression vector, is 

20 mixed with an acceptor vector, which has a different drug resistance gene, and at 
least two restriction enzyme sites, and optionally a counter-selectable gene. One 
of the sites in the acceptor vector is for a restriction enzyme (enzyme JS) that 
generates single-strand DNA overhangs which are complementary to those 
generated by enzyme I, e.g., Pvul or PacU and a restriction enzyme site for an 

25 enzyme (enzyme IV) which generates ends which can be ligated to the ends 
generated by enzyme H, e.g., enzyme IV gmerates blunt ends, for instance, 
enzyme IV is Pmel, EcoKV, Ball, or Dral. After digestion with the enzymes, 
ligation of the linearized donor and acceptors vectors yields a recipient vector 
comprising the different drug resistance gene and the DNA sequence of interest 

30 which is joined to acceptor vector sequences via ligation of the two pairs of 

complementary single-strand DNA overhangs, or via ligation of complementary 
single-strand DNA overhangs and blunt ends. 

In one embodiment, a DNA sequence of interest is modified to contain 
restriction enzyme sites for a restriction enzyme which is an infrequent cutter of 
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cDNAs or open reading frames in at least one species and generates single- 
strand DNA overhangs (enzyme I) and a restriction enzyme that is an infrequent 
cutt^ of cDNAs or open reading frames and generates ends that are not 
complementary to the ends generated by the jSrst restriction enzyme or blunt 

5 ends (enzyme U) (Figure 15). The DNA sequence of interest is mixed with an 
oligonucleotide having complementary sequences to the site for the infrequent 
cutter which generates single-strand DNA overhangs and an oligonucleotide 
having complementary sequences to the site recognized by the enzyme which is 
an mfrequent cutter and generates ends that are not complementary to the ends 

10 generated by the first restriction enzyme, e.g., blunt ends, and the mixture is 
subjected to an amplification reaction, yielding a DNA firagment In one 
embodiment, the second restriction enzyme is a blunt cutter. The sites which 
were added to the ends of the DNA sequence of interest, once digested, yield a 
single-strand DNA overhang at each end, or a single-strand DNA overhang at 

15 one end and a blunt end at the other. Complementary single-strand DNA 

overhangs to the overhangs generated by enzyme I, or a complementary single- 
strand DNA overhang to the overhangs generated to enzyme I and a blunt end, 
are generated in an acceptor vector with restriction enzymes III and IV, 
respectively, yielding a linearized acceptor vector. The linearized acceptor 

20 vector, which comprises a drug resistance gene, is ligated to the digested DNA 
fragment, to result in a recipient vector. The recipient vector contains the drug 
resistance gene of the acceptor vector and the DNA sequence of interest flanked 
by sites genemted by the joining of the complementary single-stranded DNA 
overhangs at each end, or the conq>lementary single-strand DNA overhangs at 

25 one end and the blunt ends at the other. The SgfUPmel approach can result in a 
recipient vector which encodes a protein with no additional residues at the N- 
tenninus of the protein, e.g., one positioned 3' to a RBS or Kozak sequence or 
encodmg a fiision protein with an N-terminal or C-terminal ftision of one or 
more amino acid residues (Figures 16-17 and Table I, which shows enzymes 

30 which generate blunt ends and the exchange site created by Ugation of a blunt 
end generated by Pmel and a blunt end generated by each of those enzymes). 
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The SgfiJPmel approach may also be used to introduce two DNA 
fragments of interest into the same vector (Figures 18-19). For example, a donor 
vector is obtained or prepared that contains a drug resistance gene I and a DNA 
.sequence of interest flanked by a restriction site for a restriction enzyme which is 
S ' an infrequent cutter of cDNAs or open reading frames in at least one species and 
generates single-strand DNA overhangs (enzyme I), e.g., SgfU and a site for a 
restriction enzyme which is an infrequent cutter of cDNAs or open reading 
frames and generates blunt ends (enzyme II), e.g., PmeL An acceptor vector is 
prepared or obtained that contains a drug resistant gene 2, a restriction site for a 

10 restriction CTzyme (enzyme HI) which generates single-strand DNA overhangs 
that are conq)lementary to the overhangs in a donor vector linearized with 
enzyme I, which restriction enzyme is different than enzyme I, e.g., PvmI, and a 
restriction site for an enzyme which generates blunt ends (enzyme IV), and is 
different than ^izyme n, e.g., Hpal, The acceptor vector also includes two 

15 additional restriction sites, each of which are 5' or 3' to the DNA sequence of 
interest in the acceptor vector, one of which is for a restriction enzyme (enzyme 
V) which generates single-strand DNA overhangs that ai;e complementary to the 
overhangs generated by enzyme I, which restriction enzyme is different than 
enzyme I, e.g., Pad, and another for a restriction enzyme that generates blunt 

20 ends (enzyme VI), which enzyme is different than enzyme n or enzyme IV, e.g., 
SwaL The donor vector is linearized with enzyme I and enzyme n and ligated to 
an acceptor vector linearized with enzyme HI and enzyme IV, to yield a recipient 
vector having drug resistance gene 2, the DNA sequence of interest, and sites for 
restriction enzymes V and VI which are both 5 ' or 3 ' to the DNA sequence of 

25 interest. A second donor vector having a drug resistance gene and a different 
DNA sequence of interest flanked by a restriction site for enzyme I and another 
for enzyme n is digested with enzymes I and n, and mixed with the recipient 
vector, which is linearized with enzymes V and VI, resulting in a second 
recipient vector having both DNA fragments of interest Such a recipient vector 

30 is useftil to study protein-protein interactions, e.g., in two hybrid or 

colocalization studies, and is particularly useful in systems in which one protein 
is not expressed or is only e3q>ressed at low levels in tihe absence of expression of 
a binding protein for that protein. 
V. Libraries 
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The vectors of the mvention may be employed to prepare Ubraries of 
open reading frames, such as ones representing at least 10%, and up to 50% or 
more, of the open reading frames for the genome of a particular organism, as 
well as libraries of mutated open reading frames. For instance, amplification 
5 primers for individual open reading frame are designed. For the forward primer, 
in one embodiment, an Sgfl site is placed one base S' (upstream) from a start 
codon (ATG) for the open reading frame, which primer is of a length and has 
sufficient sequence from the reading frame so as to provide an adequate Tm fpr 
annealing the primer during amplification (e,g., > 45®C) to a template having the 

10 complement of the open reading frame. The reverse primer includes a Pmel site 
appended directly to the antisense of the last codon prior to the stop codon of tiie 
open reading firame. The reverse primer is of a length and has suflBcient 
antisense sequence from the C-temiinal portion of the open reading frame so as. 
to provide an adequate Tm for annealing the primer during amplification to the 

1 5 template, and preferably matched in Tm to the corresponding forward primer 
(e.g., > 45°C). The forward and reverse primers preferably have an additional 3 
to 5 bases appended 5' to the Sgfl and Pmel sites to ensure rapid digestion of the 
amplified open reading frames by those enzymes. The open reading frame is 
then amplified from a cDNA template, an RNA preparation, genomic DNA or a 

20 ' plasmid clone having the open reading frame. The open reading frame is 

preferably amplified by a high fidelity polymerase, e.g., Pfu DNA polymerase, 
especially if the ampUfied region is greater than 800 bp. 

The amplified open reading frame may be cloned in two ways: Availing 
or digestion with Sgfl and Pmel, and ligation to an appropriately linearized 

25 vector. In one embodiment, the amplified DNA is tailed with an additional 
adenine residue at each 3* end and then cloned with standard T-tailed PGR 
cloning vectors (e.g., pGEM®-T Easy Vector, Promega). Alternatively, 
topoisomerase I sites are appended to the 5' ends of the forward and reverse 
primers and the PGR fi:agment cloned using a TOPO®-cloning vector (e.g., 

30 pCR®-Blunt, Invitrogen, or if also A-tailed, pGR®4-T0P0, Invitrogen), If Tag 
DNA polymerase is used to generate the amplified open reading frame, then A- 
tailing is unnecessary. For instance, the PGR fragment is treated with 0.2 mM 
dATP in IX Tag reaction buffer having 5 units Tag DNA polymerase for 1 5 
minutes at 70°C, and a small portion is removed (e.g., 1-2 ^il) for a ligation 
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reaction, e.g., with pGEM®-T Easy Vector, or digestion with Sgfl and Pmel, and 
ligation to a vector digested with SgfL and Pmel, e.g., ACCEPT-6 (see Figure 
21C). Optionally, the amplified fiugment is purified prior to digestion with Sgfl 
and Pmel, e.g., to remove the primers. Subsequent to the restriction digest, the 

5 fragment is optionally purified to remove small oligonucleotides liberated from 
the digested fragment 

The ligation mix is then transformed into an appropriate E. coli host, e.g., 
JM109, and plated on selective media, for instance LB-agar plates with 100 
jig/ml ampicillin. After an overnight incubation at 3TCy the resultmg colonies 

10 are picked, grown overnight in LB media supplemented with 100 ^g/ml 

ampicillin, plasmid DNA purified and screened for the appropriately sized insert, 
e.g., by digesting the plasmids with Sgfl and Pmel and subjecting the digested 
plasmids to gel electrophoresis. 

The process of cloning open reading frames can be done in parallel with 

15 a plurality of open reading fi:ames of an organism or group of organisms. For 
example, forward and reverse primers can be provided in an arrayed format, 
such as in a 96-well or 384-well plate> such that the forwaird and reverse primers 
for a particular open reading frame are in the same well. Teni{}late cDNA apd 
amplification reagents can be provided simultaneously to the whole plate and an 

20 amplification reaction carried out in all 96 or 384 wells simultaneously. 
Similarly, the steps of purifying amplified DNA, optionally digesting the 
amplified DNA with restriction enzymes or A-tailing of the amplified DNA, 
ligation to vectors and transforming of E, coli can all be accomplished in 96-well 
or 384-well plates. The transformation mixtures can be irtdividually plated on 

25 selective media, and after an overnight incubation at 37°C, the resulting colonies 
are picked, and grown ovemigjit in LB media supplemented with 100 ^ig/ml 
ampicillin. Plasmid DNA is purified and screened for the appropriately sized 
insert, for instance, by digesting the plasmids with Sgfl and Pmel and performing 
gel electrophoresis. Colonies harboring plasmids with the correctly sized inserts, 

30 or isolated plasmids can then be placed back in 96-well or 384-well plates, thus 
producing an arrayed collection, or library, of open reading frames. In one 
embodiment, the array represents 5% or more, e.g., 10% to 30%, or 70% or more 
of the open reading frames of an organism or group of organisms. Alternatively, 
the array may contain a particular subset of open reading frames, for example, a 
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multigene family of paralogous genes from a given organism, a group of 
orthologous genes from multiple organisms, a set of genes that are involved in a 
similar pathway (e.g., a signal transduction pathway), or a group of genes 
encoding functionally related gene products, e.g., including but not limited to 
5 oxidoreductases, transferases, hydrolases, lyases, isomerases or ligases, e.g., 
kinases, e.g., receptor or non-receptor tyrosine kinases or receptor or non- 
recq)tor serine/threonine kinases including MAP kinases, phosphatases, e.g., 
tyrosine phosphatases, proteases, guanylate cyclases, G-protein coupled 
receptors, G-protein regulators, cytochrome P450 enzymes, phospholipases, 

10 proteins for medical use, for instance, therapeutic proteins, proteins for industrial 
use, for instance, in biocatalysis, and the like. 

In another embodiment, a non-arrayed library of open reading frames is 
employed as a source for selection or screening for a particular property, e.g., in 
vivo binding to a protein of interest in a yeast two hybrid screen or altering the 

1 5 exjpression of a gene product of an open reading frame present in the vector 
backbone (a coexpression system). In one embodiment, DNA from colonies 
grown in each well can be purified, and small aliquots from each well can be 
combined ijito one common pool to be transfoimed into yeast which express a 
protein of interest. Alternatively, a library of open reading frames is introduced 

20 into a vector which encodes a protein of interest and clones identified which 
have open reading frames encoding gene products which interact with the 
protein of interest or increase expression of the protein of interest. In one 
embodiment, the two genes which encode interacting gene products are present 
ha a polycistronic KNA, e.g., one having an IRES. 

25 A pooled library may also be employed for directed evolution. Thus, a 

particular open reading frame is mutagenized, for example, by mutagenic PCR. 
Each mutagenized open reading frame in the mutagenized pool has Sgfl and 
Pmel sites at the 5* and 3' ends, respectively, of the open reading fi^ne. The 
mutagenized pool is optionally purified, digested with Sgfl and Pmel, optionally 

30 purified away from small oligonucleotides liberated by the restriction digests, 
and Hgated to an appropriate vector, e.g. ACCEPT-6. The Ugation mix is then 
transformed into an appropriate E, coli host, e.g., JM109, and plated on selective 
media, LB-agar plates with 100 |ig/ml ampicillin. After an overnight incubation 
at 37°C, the resulting colonies ai'c picked, grown overnight in 96-well or 384- 
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well plates using selective LB media and screened for a selected activity, e.g., an 
activity that is different than the activity of the gene product encoded by the 
corresponding nonmutagenized open reading frame. In some embodiments, 
. multiple clones are preset in each well, and sib-selection methods employed to 
5 identify clones with a desirable characteristic(s). For example, if one well shows 
desirable characteristics, it can be plated on selective media, and after an 
overnight incubation at 37°C, the resulting colonies are picked, re-grown 
overnight in selective media in 96-well or 384-well plates and rescreened for the 
characteristic(s). 

10 The invention will be further described by the following non-Umiting 

examples. 

Example I 

An ampicilUn-sensitive donor vector was prepared which has a green 

1 5 light emitting luciferase gene flanked by Sfil sites which, after digestion, do not 
yield complementary single-strand DNA overhangs (Figure 20A). An ampicillin 
resistant acceptor vector was also prepared which has a red light emitting 
luciferase gene flanked by sites which, after digestion, do not yield 
complementary single-strand DNA overhangs but each of which is 

20 complementary to one of the single-strand DNA overhangs flanking the green 
light emitting luciferase gejie. These two vectors wpre digested in T4 DNA 
ligase buffer with Sfil at 50°C for 1 hour. The reactions were cooled to room 
temperature, and T4 DNA ligase added. The ligation reaction was conducted at 
22®C for 30-60 minutes. A portion of the ligation reaction was subjected to gel 

25 electrophoresis, while another portion was used to transform JM109. The 
transformed cells were placed on nitrocellulose and incubated overnight. 

The filter was floated on 1 ml 100 mM citrate (pH 5,5) with 1 mM 
luciferin potassium salt at 40°C. An image was then obtained with a CCD 
digital camera (Minolta Dimage 7; 4 seconds ^.5). The results show that 1^51 

30 cuts in ligase buffer, and that the cut ends religate in the presence of T4 DNA 
ligase (Figure 20B). To improve the number of desired clones, an acceptor 
vector containing a counterselectable marker may be employed. 
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Example II 

Vectors 

The pDONOR-4 CAT vector was utilized as the source for the 
chloramphenicol acetyl transferase (CAT) reporter gene with its native promoter 
5 between the Sgfl and Pmel sites. pDONOR-4 contains a kanamycin resistance 
gene for bacterial selection, and restriction enzyme sites Sgfl and Pmel for 
directional and flexible cloning. 

The pDONOR-6 LacZ vector was utilized as the source for the LacZ 
reporter gene. pDONOR-6 contains a kanamycin resistance gene for bacterial 
10 selection, a T7 bacteriophage promoter, and restriction enzyme sites Sgfl and 
Pmel for directional and flexible cloning. 

The pACCEPT-F vector (Figure 21 A) was utilized as the source of the 
backbone sequence for the reporter genes. pACCEPT-F contains an ampiciUin . 
resistance gene for bacterial selection, a T7 bacteriophage promoter, and 
1 5 restriction enzyme sites Sgfl and Pmel for directional and flexible cloning. 
Results 

The LacZ reporter gene from pDONOR-6 LacZ was transferred to 
pACCEPT-F in a two step process. First, pDONOR-6 LacZ was digested with 
the restriction enzymes Sgfl and Pmel in Promega Buffer C with BSA at ST'C 

20 for 1 hour to free the LacZ gene from the vector. FoUowmg digestion, the 

restriction enzymes were inactivated by heating the reaction tube to 65°C for 20 
minutes. Second, linearized pACCEPT-F, T4 DNA ligase, ATP, DTT and 
additional Buflfer C were added to the reaction tube and ligation was initiated by 
incubating flie tube at 22°C for 1 hour. Following ligation, an aliquot of the 

25 reaction was transformed mto E. coli cells (JM109), and the transformation 
mixture was plated onto Luria Broth (LB) plates contaming ampicillin, X-Gal, 
and rhamnose. The colonies were visually screened for their ability to utilize X- 
Gal thereby producing a blue color. Results demonstrated that approximately 
90% of the colonies produced a blue color, demonstrating the percent transfer of 

30 the LacZ gene from the pDONOR-6 LacZ to the pACCBPT-F vector (percentage 
was calculated by total # blue colonies/ total# colonies x 100). 

The LacZ reporter gene from pDONOR-6 LacZ was also transferred to 
the pDEST-F in a two step process. First, vectors pDONOR-6 LacZ and 
pACCEPT-F were digested in one tube with the restriction enzymes S^ and 
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Pmel in Promega Buffer C with BSA at 3TC for 1 hour to free the LacZ gene 
from the vector. Following digestion, the restriction enzymes were inactivated 
by heating the reaction tube to 65°C for 20 minutes. Second, T4 DNA ligase, 
ATP, DTT and additional Buffer C were added to the reaction tube and ligation 
5 was initiated by incubating the tube at 22^C for 1 hour. Following ligation, an 
aliquot of the reaction was transformed into E. coli cells (JM109X and the 
transformation mixture was plated onto LB plates containing ampicillin, X-Gal, 
and rhamnose. Results demonstrated that approximately 81% of the colonies 
produced a blue color. 

10 The CAT reporter gene from pDONOR-4 CAT was transferred to the 

pACCEPT-F in a two step process. First, pDONOR-4 CAT was digested with 
Sgfl and Pmel in Promega Buffer C with BSA at 37**C for 1 hour to free ttie 
CAT gene from the vector. Following digestion, the restriction enzymes were 
inactivated at 65**C for 20 minutes. Second, linearized pACCEPT-F,T4 DNA 

1 5 ligase, ATP, DTT, and additional Buffer C were added to the reaction tube and 
ligation was performed at 25®C for 1 hour. Following Ugation, an aliquot of the 
reaction was transformed into E. coli JM109 bacterial cells, and the 
tranformation mixture was plated onto LB plates with ampicillin. Of the 
resultant colonies, 100 were re-plated onto LB plates with chloramphenicol. 

20 Colonies which grew on chloramphenicol contained the CAT gene. Transfer 
efficiency of the CAT gene from the pDONOR-4 CAT to pACCEPT-F vector 
was determined to be approximately 94% (percentage was calculated by total # 
CAT resistant colonies/ total# colonies tested x 100). 

The CAT reporter gene from pDONOR-4 CAT was transferred to the 

25 pACCEPT-F in a one step process. To the reaction tube was added pDONOR-4 
CAT, linearized pACCEPT-F, restriction enzymes SgfL and Pme\ Promega 
Buffer C with BSA, T4 DNA ligase, ATP, and DTT, The restriction digest was 
initiated by incubating the reaction tube at 37°C for 1 hour. Following digestion, 
, the reaction temperature was lowered to 25'^C for 1 hour to allow for the ligation 

30 reaction to occur. Following ligation, an aUquot of the reaction was transformed 
into E, coli JM109 bacterial cells, and the tranformation mixture was plated onto 
LB plates with ampicillin. Of the resultant colonies, 100 were re-plated onto LB 
plates with chloramphenicol. Colonies which grew on chloramphenicol 
contained the CAT gene. Transfer efficiency of the CAT gene from the 
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pENTRY-4 CAT to the acceptor vector was determined to be approximately 
37%. 

Example in 

5 An inducible system useful for cloning including directional cloning 

includes a recombinant host cell encoding a gene product regulated by an 
inducible promoter, which gene product specifically increases transcription of a 
DNA of interest in a vector introduced to the cell. In one embodiment, a first 
vector includes the open reading frame for a gene of interest operably linked to a 

10 promoter, e.g., a T7 promoter, which vector has a transcription terminator 

sequence, for instance, the mzB terminator (to reduce aberrant expression), 5* to 
the promoter, a drag resistance gene, e.g., fein^, sequences which permit ttie 
vector to be maintained in a host cell at high copy numbers, optionally sequences 
which reduce vector multimerization, e.g., cer sequences, as well as restriction 

15 enzyme sites flanking the open reading frame, hi one embodiment, the 

restriction enzyme sites flanking the open reading frame are for two different 
infrequent cutters which do not generate complementary DNA ends (enzyme I 
and enzyme II) (Figure 21). The vector in Figure 21 also includes a T7 
transcription terminator 3' of a Pmel site. A second vector having a backbone of 

20 interest for the open reading frame, preferably contains a different drug 

resistance gene, e.g., amp\ and optionally the same transcription terminator 
sequences, promoter, sequences which permit the vector to be maintained in a 
host cell at high copy numbers, and optionally sequences which reduce vector 
multimerization as the vector containmg the open readmg frame of interest, 

25 wherein the transcription terminator sequences and promoter in the second 

vector are 5* to restriction enzyme sites for two restriction enzymes (enzyme HI 
and enzyme IV) that generate ends that are compatible with ends generated by 
enzyme I and enzyme II, respectively. For instance, enzyme I is Sgfl, enzyme n 
is Pmel, enzyme m is Pvul, and enzyme IV is Dral. In another embodiment, the 

30 restriction sites recognized by enzymes I and HI are the same, e.g., sites for Sgfl, 
and the restriction sites recognized by enzymes n and IV are the same, e.g., sites 
for Pmel, The resulting vector is introduced into a host cell which can be 
induced to express a gene product which increases transcription of the promoter 
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which is 5' to the open reading frame, e.g., a gene product such as T7 RNA 
polymerase. 

For example, a rhamnose-inducible system including a host cell useful to 
a clone and express a gene of interest was prepared. For instance, one or more 
5 of the rhoBAD catalytic genes in JM109 are deleted, replaced or interrupted via 
insertional mutagenesis. In one embodiment, the rhaB gene in JM109 was 
deleted, and a vector with the rhoBAD promoter (e.g., see Egan et al., L Mol. 
BioL ^:87 (1993) and Wihns et al., Biotech Bioeng,, 73:95 (2001)) Imked to 
the T7 RNA polymerase open reading frame, stably introduced to those cells, 

10 yielding recombinant host cell JM109RX. A vector containing a luciferase gene 
linked to the T7 promoter was introduced to JM109RX, BL21(DE3) (Novagen), 
and BL21-AI (bivitrogen) cells. The transformed cells were grown at either 
25°C or 37°C, then contacted with ihamnose (JM109RX), IPTG (BL21(D53)), 
or arabinose (BL21-AI), and luciferase activity measured at various time points. 

1 5 The data showed that there was a much lower level of uninduced 

luciferase expression in transformed JM109RX cells than in the comparable 
arabinose inducible system. The rhamnose inducible system may thus be 
particularly useful to clone toxic genes present in a donor vector or an amplified 
fragment, although the rhamnose-inducible system is not limited to the cloning 

20 of those genes. 

Moreover, the induction of luciferase activity in transformed JM109RX 
cells was slow compared to luciferase activity in transformed BL21(DE3) or 
BL21-AI cells, yet resulted in high final induction levels, e.g., high protein 
levels, e.g., at times t = 4 hours at which RLU were 100 X greater (Figures 22A 

25 and C). Further, the use of a rhamnose-inducible system at 25°C yielded more 
luciferase activity than at 37^*0, e.g., at least 10-70 fold more at peak (Figures 
22 A and C). The observed expression profile of such a system may allow for 
increased solubility of the expressed protein, e.g., due to increased time to fold. 
In addition, the rhamnose-inducible system is glucose repressible. Therefore, 

30 combinations of rhamnose and glucose may be employed to finely tune the 
expression profile of a gene of interest which is linked to a rAoBAD promoter. 
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Example IV 

A system to express a toxic gene was prepared. A stably transformed 
host cell, JM109, was prepared that contained an expression vector encoding an * 
immunity factor for bamase, barstar, which was expressed from a constitutive 
5 promoter, e.g., the 4c promoter, integrated into laniB, A vector containing a 
lambda Pl promoter linked to a truncated bamase gene (see, e.g., Accession No. 
X12871 or M14442 (bamase genes from Bacillus amyloliquefaciens) or 
AE007600 (a bamase gene from Alostridium acetobutylicum\ which lacked the 
secretory sequence, was introduced to those stably transformed cells. 

10 

All publications, patents and patent applications are incorporated herein 
by reference. While in the foregoing specification this invention has been 
described in relation to certain preferred embodiments thereof, and many details 
have been set forth for purposes of illustration, it will be apparent to those skilled 
15 in the art that the invention is susceptible to additional embodiments and that . 
certain of the details described herein may be varied considerably without 
departing from the basic principles of the invention. 
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WHAT IS CLAIMED IS: 

1 . A method for the directional subcloning of DNA fragments comprising: 

a) providing a first vector comprising a first selectable marker gene and a 
DNA sequence of interest, which DNA sequence of interest is flanked by at least 

5 two restriction enzyme sites, wherein at least one of the flanking restriction 
enzyme sites is a site for a first restriction enzyme which has infirequent 
restriction sites in cDNAs or open reading frames &om at least one species and 
generates complementary single-strand DNA overhangs, wherein at least one of 
the flanking restriction enzyme sites is for a second restriction enzyme which has 

10 mfrequent restriction sites in cDNAs or open reading fi:ames from at least one 
species and generates ends that are not complementary to the overhangs 
generated by the first restriction enzyme, wherein digestion of the first vector 
with the first restriction enzyme and the second restriction enzyme site generates 
a first linear DNA firagment which lacks the first selectable marker gene but 

1 5 conoqprises the DNA sequence of interest 

b) providing a second vector comprising a second selectable marker gene 
which is distinguishable from the first selectable marker gene and non-essential 
DNA sequences, optionally including a counterselectable gene, which non- 
essential sequences are flanked by at least two restriction enzymes sites, wherein 

20 at least one of the flanking restriction enzyme sites in the second vector is for a 
third restriction enzyme which generates complementary single-strand DNA 
overhangs that are complementary to the single-strand DNA overhang generated 
by the first restriction enzyme in the first linear DNA fragment, wherein at least 
one of the flanking restriction sites in the second vector is for a fourth restriction 

25 enzyme which generates emds that are not complementary to the ends generated 
by the first or third restriction enzyme but can be ligated to the ends generated by 
the second restriction enzyme, and wherem digestion of the second vector with 
the third restriction enzyme and the fourth restriction enzyme generates a second 
linear DNA firagment which lacks non-essential DNA sequences but comprises 

30 the second selectable marker, which second linear DNA firagment is flanked by 
ends which permit the oriented joining of the first linear DNA fragment to the 
second linear DNA fragment; and 

c) combining the first and second vectors, the first vector and the second 
linear DNA firagment, or the second vector and the first linear DNA firagment in 
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a suitable buffer with one or more restriction enzymes and optionally DNA 
ligase under conditions effective to result in digestion and optionally ligation to 
yield a mixture optionally comprising a third vector comprising the &st and 
second linear DNA molecules which are joined in an oriented manner. 

5 

2, The method of claim 1 wherein ttie second restriction enzyme generates 
blunt ends and the first linear DNA fragment is flanked by a first single-strand 
DNA overhang and a blunt end. 

10 3. The method of claim 1 wherein the first and third restriction enzymes are 
not the same. 

4. The method of claim 1 wherein the second and fomlh restriction enzymes 
are not the same. 



15 



20 



25 



5. The method of claim 1 wherein the second and fourth restriction enzymes 
generate blunt ends. 

6. The method of claim 1 wherein the first restriction enzyme is Sgft. 

7. The method of claim 6 wherein the second restriction enzyme is Pmel. 

8. The method of claim 1 wharein the third restriction enzyme generates a 3' 
TA oveAang. . 

9. The method of claim 8 wherein the third restriction enzyme is Pvul or 
Pad 



10. The method of clamx 1 wherein the DNA sequence of interest comprises 
30 an open reading frame comprising one or more sites for the first or second 
restriction enzyme. 
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1 1 . The method of claim 10 wherein prior to digestion with the one or more 
restriction enzymes, the sites for the one or more restriction enzymes in the open 
reading frame are protected so as to prevent digestion. 

5 12. The method of claim 1 1 wherein the sites are protected by methylation. 

1 3 . The method of claim 1 2 wherein prior to methylation the flanking sites 
for the first or second restriction enzyme are contacted with an oligonucleotide 
complementary to the flanking restriction enzyme site and RecA. 

10 

14. A vector system for cloning comprising: 

a first vector comprising a first selectable marker gene and a DNA 
sequence of interest, which DNA sequence of interest is flanked by at least two 
restriction enzyme sites, wherein at least one of the flanking restriction enzyme 

1 5 sites is a site for a first restriction enzyme which has infrequent restriction sites 
in cDNAs or open reading firames from at least one species and generates 
complementary single-strand DNA overiiangs, wherein at least one of the 
flanking restriction enzyme sites is for a second restriction enzyme which has 
infrequent restriction sites in cDNAs or open reading fi^es &om at least one 

20 species and generates ends that are not complementary to the overhangs 

generated by the first restriction enzyme, wherein digestion of the first vector 
generates a first linear DNA fragment which lacks the first selectable marker 
gene but comprises the DNA sequence of interest, wherem the restriction 
enzyme sites are designed such that the first linear DNA fragment can be 

25 religated directly to a second vector comprising a second selectable marker gene 
which is distinguishable from the first selectable marker gene and non-essential 
DNA sequences, optionally including a counterselectable gene, which non- 
essential DNA sequences are flanked by at least two restriction enzymes sites, 
wherein at least one of the flanking restriction enzyme sites in the second vector 

30 is for a third restriction enzyme which generates complementary single-strand 
DNA ove±angs which are complementary to the single-strand DNA overhangs 
generated by the first restriction enzyme, wherein at least one of the flanking 
restriction sites in the second vector is for a fourth restriction enzyme which 
generates ends that are not complementary to the ends generated by the first or 
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fbird restriction enzyme but can be ligated to the ends generated by tbe second 
restriction enzyme, wherein digestion of the second vector with the third and 
fourth restriction enzymes generates a second linear DNA fragment which lacks 
the non*essentiai DNA sequences but comprises the second selectable marker 
S gene, wherein the second linear DNA fragment is flanked by ends which permit 
the oriented joining of the first linear DNA fragment to flie second linear DNA 
fragment. 

15. The vector system of claim 14 wherein the second restriction enzyme 
10 generates blunt ends and the first linear DNA fragment is flanked by a first 

single-strand DNA overhang and a blunt end. 

1 6. The vector system of claim 14 wherein the first and third restriction 
enzymes are not the same. 

15 

17. The vector system of claim 14 wherein the second and fourth restriction 
enzymes are not the same. 

1 8. The vector system of claim 14 wherein the second and fourth restriction 
20 enzymes generate blunt ends. 

19. The vector system of claim 14 wherem ligation and oriented jointing 
yields a third vector encoding a N-terminal fiision protein which is encoded by 
the DNA sequence of interest and nucleic acid sequences 5' to the 3* end of the 

25 second linear DNA fragment 

./ 

20. The vector system of claim 1 4 wherein ligation and oriented joining 
yields a third vector encoding a C-terminal fusion protein which is encoded by 
the DNA sequence of interest and nucleic acid sequences 3* to the 5* end of the 

30 second linear DNA firagment. 

2 1 . The vector system of claim 14 wherein ligation and oriented joining 
yields a third vector encoding a fusion protein which is encoded by the DNA 
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sequence of interest and nucleic acid sequences 5* and 3' to the respective 3* and 
5' end of the second linear DNA fragment. 

22. The vector system of claim 14 wherein ligation and oriented joining 

5 yields a Ihird vector encoding a fusion protein encoded by the DNA sequence of 
interest and the exchange site(s) created by the oriented jommg. 

23. The vector system of claim 14 wherein one of the restriction enzymes is 
Aarl, AscU BbrCl, Cspl DrdU Fsel, Noil Nrul, Pad, Pmel, Pvul Sapl Sdal 

10 SfiU Sgft^ SpEy SrfU SwdU or a restriction enzyme which has the same 

recognition site as Aarl, AscI, BbrCl, Cspl, Dral, Fsel, Notl, Nrul, Pad, Pmel, 
PvuU Sapl Sdal Sfll, Sgfl, SpR, Srfi, Swal. 

24. The vector system of claim 20, 21 or 22 wherein the fusion protein is a 
1 5 GST fusion protein, GFP fusion protein, thioredoxin fusion protein, maltose 

binding protein fusion protein, protease cleavage site fusion protein, metal 
binding domain fusion protein or dehalogenase fusion protein. 

25. The vector system of claim IZO, 21 or 22 wherein the fusion protein is 
20 more soluble, easier to purify or easier to detect relative to the corresponding 

non-fusion protein. 

26. A kit comprising the vector system of claim 14. 

25 27. A method for producing a vector suitable for expression of an amino acid 
sequence of interest, comprising: 

combining at least two vectors in a suitable buffer with one or more 
restriction enzymes and optionally DNA ligase under conditions effective to 
result in digestion and optionally ligation to yield a mixture optionally 

30 comprising a third vector, wherein a first vector comprises a first selectable 

marker gene and a DNA sequence of interest, which DNA sequence of interest is 
flanked by at least two restriction enzyme sites, wherein two or more of the 
flankmg restriction enzyme sites are sites for a first restriction enzyme which is a 
hq)axoterministic restriction enzyme, wherein digestion of the first vector with 
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the first restriction enzyme generates a first linear DNA firagment which lacks 
the first selectable marker gene but comprises the DNA sequence of interest and 
a first pair non-self complementary single-strand DNA overhangs, wherein a 
second vector comprises a second selectable marker gene which is 

5 distinguishable firom the first selectable marker gene and non-essential DNA 
sequences that optionally mclude a counterselectable gene, which non-essential 
DNA sequences are flanked by "two or more restriction enzyme sites, wherein 
two or more of the flanking sites in the second vector are for a second restriction 
enzyme which is a hapaxoterministic restriction enzyme, wherein digestion of 

10 the second vector with the second restriction enzyme generates a second linear 
DNA jfragment which lacks non-essential DNA sequences but comprises the 
second selectable marker gene and a second pair of non-self complementary 
single-strand DNA overhangs, wherein each of the second pair of the non-self- 
complementary DNA overhangs is complementary to only one of the single- , 

1 5 strand DNA overhangs of the first pair of non-self complementary single-strand 
DNA overhangs and permits the oriented joining of the first linear DNA 
firagment to the second linear DNA firagment. 

28. A method for producing a vector suitable for expression of an amino acid 

20 sequence of interest, comprising: 

combining at least two vectors in a suitable buffer with one or more 
restriction enzymes and optionally DNA ligase under conditions efiTective to 
result in digestion and optionally ligation to yield a mixture optionally 
comprising a third vector, wherein a first vector comprises a first selectable 

25 marker gene and a DNA sequence of interest, which DNA sequence of interest is 
flanked by at least two restriction enzyme sites, wherein at least one of the 
flanking restriction enzyme sites is a site for a first restriction enzyme which has 
infrequent restriction sites in cDNAs or open reading firames fi'om at least one 
species and generates complementary single-strand DNA overhangs, wherein at 

30 least one of the flanking restriction enzyme sites is for a second restriction 

enzyme which has infi-equent restriction sites in cDNAs or open reading firames 
firom at least one species and generates ends that are not complementary to the 
overhangs generated by the first restriction enzyme, wherein digestion of the 
first vector generates a first linear DNA firagment which lacks the first selectable 
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marker gene but comprises the DNA sequence of interest, wherein a second 
vector comprises a second selectable marker gene which is distinguishable from 
the first selectable marker gene and non-essential DNA sequences, optionally 
including a counterselectable gene, which non-essential DNA sequences are 
S flanked by at least two restriction enzymes sites, wherein at least one of the 
flanking restriction enzyme sites in the second vector is for a third restriction 
enzyme which generates single-strand DNA overhangs which are 
complementary to the single-strand DNA overhangs generated by the first 
restriction enzyme, wherein at least one of the flanking restriction sites in the 

10 second vector is for a fourth restriction enzyme that which gen^ates ends that 
are not complementary to the ends generated by the first or third restriction 
enzyme but can be Ugated to the ends generated by the second restriction 
enzyme, and wherein digestion of the second vector with the third and fourth 
restriction enzymes generates a second linear DNA Augment which lacks the 

15 non-essential DNA sequences but comprises the second selectable marker gene, 
wherein the second linear DNA fragment is flanked by ends which permit the 
oriented joining of the first linear DNA firagment to the second linear DNA 
firagment 

20 29, The method of claim 28 wherem the second restriction enzyme generates 
blunt ends and the first linear DNA fragment is flanked by a first single-strand 
DNA overhang and a blunt end. 

30. The method of claim 28 wherein the first and third restriction enzymes 
25 are not the same. 

3 1 . The method of claim 28 wherein the second and fourih restriction 
enzymes are not the same* 

30 32. The method of claim 28 wherein the second and fourth restriction 
enzymes generate blunt ends. 

33. The method of claim 28 wherein one of the restriction enzymes is a class 
nS restriction enzyme. 
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34. The method of claim 33 wherein the class IIS restriction enzyme is 
AccBU^ Acem, AcPNI, Adel, Ahdl, AlwlO, Alwl, Ahm, ApaBl, AspBl, AspU 
AsuUm, Bbsl, Bbvl, BbvU, BceBSl, BceSi, BcNl, BfO, BgH, Binl, Bmrl, Bpi% 
5 BpmU BpuAl, Bsal, Bse3DI, BseAl, BseGl, BseLl, BseBl, Bsgl, BsH, BsmAl, 
BsmBl BsmFU B^Mi, BsrDl, BsfJll, BstAPJ, BstFSl, Bs€il, Bsu6I, Dram, 
DrdU DseDl, Eaml 1041, Eaml 1051, Earl, EchEKI, Eco3ll, EcoSTL, EcdMl, 
il396I, £1^31, FoU, Faul, Gsul, Hgal, Hphl, MboU, AdsiYl, Mwol, NruGl, 
PflMl PflFl, Plel, Sf<m, TspRl, Ksp632J, Mmel, RleAL, Sapl, SfH, TaqE, 

10 TlAl 111, 2)/il 1 in, Van9ll, XagI, Xcml, or a restriction enzyme which has the 
same recognition site as AccBTl, AceHL, AcPNl, Adel, Ahdl, Alw261, Alwl, 
AhvNl, ApaBl, AspEl, Aspl AsuHPl, Bbsl, Bbvl, BbvU, Bcei31, BceO, BcNl, 
BJil, Bgli, Binl, Bmrl, Bpil, Bpml, BpuAl, Bsal, Bse3Dl, BseAl, BseGl, BseLl, 
BseBI, Bsgl, BsH, BsmAl, BsmBl, BsmFl, BspMl, BsrDl, Bstlll, BstAPl, 

15 BstFSl, BstXl,Bsu61, Dram, Drdl, DseDl, Eamn04l,EamUQ5l, Earl, 

EchHKl, Eco31l, EcoSTl, Ecdm, il396I, Esp31, Fokl, Faul, Gsul, Hgal, Hphl, 
MboU, MsiYl, Mwol, NruGl, PjMl, PflFl, Plel, SfaNl, TspM, Ksp632l, Mmel, 
RleM, Sapl, Sfil, TagU, Tthl 1 II, Tthl 1 IH, Van9ll, Xagl, Xcml. 

20 35. The method of claim 33 wherein one of the restriction enzymes is Aval, 
AmaSll, BcdU BsoBl, EcoZZl, Avail, Eco47l, BmelSl, HgiEl, Sinl, Banl, 
AccBll, BshNl, Eco64l, Bfihl, BstSFl, Sfcl, BpulOl, BsaMl, BscCl, Bsml, 
Mva\1691, Bsh\2&5l, BsaOl, BsBl, BstMCl, Bsell 5jeNI, Bsrl, CfrlOl, Bsil, 
BssSl, BsOBl, Bsm, AspS9l CfrUl, Sau961, BsplllOl, Blpl, Bpul 1021, Cefll, 

25 BstACl, BsfDEl, Ddel, Cpol, Cspl RsrO, Dsal, BstDSl, Eco2Al, BanE, EcoT3Sl, 
FriOl HgOn, Ecomi, Sfyl, Bsmi, EcoTUl, ErhX Espl Blpl Bpul 1021, 
B5P1720I, Cem, HgiAl, BsiHKAl, Alw2ll, Aspm, Bbvl21, Hmfl, PspFFl, 
PpuMl, PspSU, SahDl Sdul, Bspl2S6l, Bmyl, Seel, BsaJl, BseDl, Sfcl, Bfinl, 
BstSFl, SmK, or a restriction oizyme which has die same recognition site as 

30 Aval, AmaZll Bcol, BsoBl, EcoZZl, Avail, EcoATl, BmelZl, HgiBl, Sinl, Banl, 
AccBll, Bshm, EcoMl, Bfml, BstSFl, Sfcl, BpulQl, BsaMl, BscCl, Bsml, 
Mval2691, BshmSl BsaOl, BsiFI, BstMCl, Bsell, BseM, Bsrl, CfrlOl, Bsil 
BssSl, BstlBl, BsiZl, AspS91, CfrUl, Sau96l, BsplllOl, Blpl BpuUOH, CelO, 
BstACl, BstDEl Ddel, Cpol Cspl RsrU, Dsal BstDSl EcoUl BanE, £coT38I, 
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FriOl HgiJJl EcoUOl Styl BssTW, EcoTUl, Erhl Espl Blpl Bpul 1021, 
55pl720I, Ce/n, HgiKL, ^jiHKAI, AlwlW, Aspm, Bbv\2\ Hinfl, PspPl?l 
Ppum, PspSE, SanDl Sdu\ &/?1286I, BmyU Seel, BsaJl, BseDl^ Sfcl, Bfinl, 
BstSFl, Sma. 

5 

36. The method of claim 1 or 28 wherein one of the restriction enzymes is 
Aarl Ascl BbrCl, Cspl, Dral Fsel, Notl, Nrul, Pad, PmeU Pvul, SapU SdaU 
SfiU Sgfl, SplU SrfU Swa\ or a restriction enzyme that has the same recognition 
site as Aarl, Ascl, BbrCl, CspU Dral, Fsel, Notl, Nrul, Pad, Pmel, Pvul, Sapl 

10 Sdal, Sfil, Sgfl, Spll, Srfl, Swal. 

37. A method of inducing expression of a DNA sequence of mterest in a host 
cell, comprising contacting a recombinant host cell which is deficient in 
rhamnose catabolism, and has a recombinant DNA molecule comprising a 

1 5 rhamnose-inducible promoter operably linked to an open reading fi-ame for a 
heterologous RNA polymerase, with rhamnose and an expression vector 
comprising a promoter for the heterologous RNA polymerase operably linked to 
a DNA sequence of interest. 

20 38. The method of claim 37 wherein the DNA sequence of interest is flanked 
by two restriction enzyme sites, wherein one of the flanking restriction enzyme 
sites is for a first restriction enzyme which has infrequent restriction sites in 
cDNAs or open reading frames from at least one species and generates single- 
strand DNA overhangs, and \yherein another flanking restriction enzyme site is 
for a second restriction enzyme which has infrequent restriction sites in cDNAs 
or open reading frames from at least one species and generates ends that are not 
complementary to the overhangs generated by the first restriction enzyme. 

39. A method comprising introducing a vector comprising a nucleic acid 
fragment encoding a bamase which lacks a secretory domain into a recombinant 
host cell which expresses barstar from a promoter which is consitutively 
expressed in prokaryotic cells. 
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40. A method comprising introducing the vector system of claim 14 into a 
host cell, wherein the second vector con:q)rises a counterselectable gene 
comprising a nucleic acid fragment encoding a bamase which lacks a secretory 
domain. 

5 

41. A vector comprising an open reading frame 3' to a DNA fragment of no 
more than 30 base pairs, which DNA fragment comprises a ribosome binding 
site, a Sgfi recognition site, and a sequence which, when present in mRNA, 
enhances the bmding of the mRNA to the small subunit of a eukaryotic 

10 ribosome. 

42. The vector of claim 41 wherein the DNA fragment includes 
AAGGAGCGATCGCX, ATGX2 (SEQ ID N0:1), and wherein Xi and X2 are 
individually an A, T, G or C. 

15 

43. A vector comprising a Sgft recognition site, a sequence which comprises 
ATG and which sequence, when present in mRNA, enhances the binding of the 
mRNA to the small subunit of a eukaryotic ribosome, and an open reading frame 
which begins at the ATG in the sequence. 

20 

44. A vector comprising a Sgfl recognition site 5' to a recognition site for a 
first restriction enzyme which generates blunt ends, which vector, once digested 
with Sgfl and the first restriction enzyme and ligated to a DNA fragment 
comprising an open reading frame flanked by an end generated by a second 

25 restriction enzyme that generates a 3' TA overhang and an end generated by a 
third restriction enzyme which has infrequent restriction sites in cDNAs or open 
reading frames from at least one species and generates blunt rads, yields a 
recombinant vector comprising the open reading frame. 

30 45. The vector of claim 44 wherein the first and third restriction enzymes are 
the same. 



46. The vector of claim 44 wherein the first and third restriction enzymes are 
different. 
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47. The vector of claim 44 wherein the first restriction enzyme is PmeU 
EcoKV 01 BaH. 

5 48. The vector of claun 44 wherein the first restriction enzyme is PmeU 
DrdU EsaBCJL, HindUL, Hpal, SciJ or SwaL 

49. The vector of claim 44 wherein the first restriction enzyme is AM, Ball, 
Bjm, BsaAI, BsaBI, BsrBU Btrl, CacSI, CdiU Cv/JI, CViRI, Eco4im, EcolSl, 

10 EcolCBI, EcoRY, FnuDE, FspAl, Hael HaeWi, HpySl, LpnU Mlyl MsR, Mst\ 
Nael, iVaflV, Nrul, NspBE, O/iI, PmaCl, Pmel, PshAI, PsH, PvuO, Rsal, Seal, 
Smal, SndBl, Srfi, Sspl, SspDSl, Stul,Xcal, XmnI, or ZrdL 

50. The vector of claim 44 wherein the restriction enzyme that generates a 3' 
15 TA overiiang is Sgfl, 

5 1 . The vector of claim 44 which further comprises an open reading firame 
which includes the Sgfl site. 

20 52. The vector of claim 44 which comprises a ribosome binding site 5' to the 
nucleotide cleaved by Sgfl. 

53. The vector of claim 44 wherein ligation generates the following sequence 
in the recombinant vector AAGGAGCGATCGCYATG (SEQ ID NO:69) or 

25 X1X2X3GCGATCGCCATG (SEQ ID NO:70), wherein X1-X3, X2X3G or X3GC 
is a codon which is not a stop codon, and wherein Y is A, T, G or C. 

54. The vector of claim 44 wherein ligation generates the following sequence 
m the recombinant vector X1X2X3GTTTY1 Y2, wherein X1X2X3 is a codon in an 

30 open reading jframe which is not a stop codon and Yi and Y2 each =A, Yi = A 
and Y2 = G or Yf = G and Y2 = A. 



55. The vector of claim 44 wherein ligation generates the following sequence 
in the recombinant vector X1X2X3GTTTY1 Y2, wherein X1X2X3, X2X3G or X3GT 
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is a codon in an open reading frame which is not a stop codon and Yi is not A 
when Y2 is A or G, or Yi is not G when Y2 is A 

56. A vector comprising a first open reading firame which includes a Sgfl 

5 recognition site and a recognition site which is not in the open reading fi'ame for 
a restriction enzyme that has infrequent restriction sites in cDNAs or open 
reading frames from at least one species and generates blunt ends, which vector, 
once digested with Sgfl and the restriction enzyme which has infrequent 
restriction sites in cDNAs or open reading frames from at least one species and 
10 generates blunt ends, and ligated to a DNA fragment comprising a second open 
reading flanked by a single-strand 3' TA DNA overhang and a blunt end, yields 
a recombinant vector comprising a third open reading frame comprising the first 
and second open reading frames, which third open reading frame encodes a 
fiision peptide or protein. 

15 

57. A vector comprising a ribosome binding site which optionally overlaps 
by one nucleotide with a Sgfl recognition site and a recognition site which is not 
in the open reading fi-ame for a restriction enzyme that has infrequent restriction 
sites in cDNAs or open reading frames firom at least one sfpecies and generates 

20 blunt ends, which vector, once digested with Sgfl and the restriction enzyme that 

has infrequent restriction sites in cDNAs or open reading firames &om at least 

one species and generates blunt ends, and ligated to a DNA firagment comprising 

an open reading frame encoding a peptide or polypeptide flanked by 

5' CGCCATGXi Yi (SEQ ID NO:2) 
25 3* TAGCGGTACX2Y2 (SEQ ID N0:71) 

and a blunt end, yields a recombinant vector which encodes the peptide or 
polypeptide, wherein Xi is the first codon which is 3' to the start codon for the 
open reading frame, wherein X2 is the complement of Xi, wherein Yi is the 
30 remainder of the open reading frame, and wherein Y2 is the complement of Yi. 

58. The vector of claim 57 wherein Xi = GR1R2, wherein Ri or R2 = A, T, C 
orG 
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59. A vector comprising a recognition site for a first restriction enzyme that 
generates a 3' TA overhang which is 5' to a recognition site for a second 
restriction enzyme which generates blimt ends, which vector, once digested with 
the &st and second restriction enzymes and ligated to a DNA fragment 
5 conq)rising an open reading frame flanked by an end generated by Sgfl and an 
end generated by a third restriction enzyme which has infrequent restriction sites 
in cDNAs or open reading frames from at least one species and generates blunt 
ends, yields a recombinant vector comprising the open reading fi^e. 

10 60. The vector of claim 59 wherem the second and third restriction enzymes 
are the same. 

6 1 . The vector of claim 59 wherein the second and third restriction enzymes 
are different. 

15 

62, The vector of claim 59 whwein the second restriction enzyme is Pmel, 
EcoRY or Ban, 

63 . The vector of claim 59 wherein the second restriction enzyme is Pmel^ 
. 20 DrdU EsaBC3l, Hindm, Hpal, SciJ or SwaL 

64, The vector of claim 59 wherein the second restriction enzyme is Alul^ 
Ban, BfrBl, BsaM, BsaBl, BsrBl, Btrl, CacZl CdiX CviJl CViRI, EcoAim, 
EcoUl, ^coICRI, EcoRV, FnuDH, FspM, Hael, Haem, Hpyil, Lpnl, Mlyl, 

25 MstU Mstl Nael, NallV, Nrul NspBU, ObX PmaCl Pmel, PshAI, Pst% PvuU, 
RsaU Seal, Smal, SmBl, Srfl, Sspl, SspDSI, 5/mI, Xcal Xmn\ or Zral. 

65, The vector of claim 59 wherein the restriction enzyme that generates a 3' 
TA overhang is Sgfl, 

30 

66. The vector of claim 59 which further comprises an open reading frame 
which includes die recognition site for the first restriction enzyme. 
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67. The vector of claim 59 which comprises an appropriately positioned 
ribosome binding site 5' to the nucleotide cleaved by the first restriction enzyme. 

68. The vector of claim 59 wherein Ugation generates the following sequence 
5 in the recombinant vector AAGGAGCGATCGCYATG or 

X1X2X3GCGATCGCCATG, wherein X1-X3, X2X3G or X3GC is a codon which 
is not a stop codon, and wherein Y is A, T, G or C. 

69. The vector of claim 59 wherein ligation generates the following sequrace 
10 in the recombinant vector X1X2X3GTTTY1Y2, wherein X1X2X3 is a codon in an 

open reading frame which is not a stop codon and Yi and Y2 each -A, Yi = A 
andY2 = GorYi = GandY2='A. 

70. The vector of claim 59 wherein ligation generates tiie following sequence 
15 in the recombinant vector XiX2X3GTTTYiY2, wherem X1X2X3, X2X3G or X3GT 

is a codon in an open reading frame which is not a stop codon and Yi is not A 
when Y2 is A or G, or Yi is not G when Y2 is A. 

71. A vector comprising a first open reading frame which includes a 

20 recognition site for a first restriction enzyme that generates a 3* TA overhang and 
a recognition site for a second restriction enzyme that is not in the open reading 
frame generates blunt ends, which vector, once digested with the first and second 
restriction enzymes and ligated to a DNA fragment comprising a second open 
reading flanked by an end generated by Sgft and a third restriction enzyme which 

25 has infrequent restriction sites in cDNAs or open reading frames from at least 
one species and generates blunt ends, yields a recombinant vector comprising a 
third open reading fi:ame comprising the first and second open reading firames, 
which third open reading frame encodes a ftision peptide or protein. 

30 72. A vector comprising a ribosome binding site which optionally overlaps 
by one nucleotide with a Sgfl recognition site and a recognition site for a first 
restriction enzyme that generates blunt ends, which vector, once digested with 
Sgfl and the first restriction enzyme and ligated to a DNA fragment comprising 
an open reading frame encoding a peptide or polypeptide flanked by 
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5* CGCCATGXi Yi (SEQ ID NO:2) 
3' TAGCGGTACX2y2 (SEQ ID N0:71) 

and a blunt end generated by a second restriction enzyme that has infrequent 
5 restriction sites in cDNAs or open reading frames from at least one species and 
generates blunt ends, yields a recombinant vector which encodes the peptide or 
polypeptide, wherein Xi is the first codon which is 3' to the start codon for the 
open reading frame, wherein X2 is the complement of Xi, wherein Yi is the 
remainder of the open reading frame, and wherein Y2 is the complement of Yi. 

10 

73. A support comprising a plurality of recombinant vectors, two or more of 
which comprise an open reading frame for a different polypeptide, wherein at 
least one recombinant vector comprises a promoter and a first open reading 
firame which is flanked by two exchange sites, wherein the exchange sites are 
formed by ligation of 

a vector comprising the promoter which is 5' to a recognition site for a 
first restriction enzyme that generates a 3' TA overhang which is 5' to a 
recognition site for a first restriction enzyme which generates blunt ends, which 
vector is digested with the first and second restriction enzymes, and 

a DNA sequence comprising the first open reading frame flanked by an 
end generated by Sgfl and an end generated by a third restriction enzyme which 
has infrequent restriction sites in cDNAs or open reading frames from at least 
one species and generates blunt ends. 

74. The support of claim 73 wherein the vector fiirther comprises a second 
open reading firame 3 ' to the promoter which second open reading fi^e 
includes the recognition site for the first restriction enzyme, which second open 
reading frame, when ligated to the first open reading frame, forms a third open 
reading frame which encodes a fiision peptide or protein. 

75. The support of claim 73 wherein ligation generates the following 
sequence in the recombinant vector AAGGAGCGATCGCYATG or 
X1X2X3GCGATCGCCATG, wherein X1-X3, X2X3G or X3GC is a codon which 
is not a stop codon, and wherein Y is A, T, G or C. 
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76. The support of claim 73 wherein ligation generates the following 
sequence in the recombinant vector X1X2X3GTTTY j Y2, wherein X1X2X3 is a 
codon in an open reading frame which is not a stop codon and Yi and Y2 each 
=A,Yi-AandY2 = GorYi = GandY2 = A. 

5 

77. The support of claim 73 wherein ligation generates the following 
sequence in the recombinant vector X1X2X3GTTTY1 Y2, wherein X1X2X3, 
X2X3G or X3GT is a codon in an open reading frame which is not a stop codon 
andYi isnot A when Y2is AorG, orYi is not G when Y2 is A. 

10 

78. A recombinant vector prepared by digesting a vector comprising a Sgfl 
recognition site 5' to a recognition site for a first restriction enzyme which 
generates blunt ends, with Sgfl and the first restriction enzyme and ligating the 
digested vector to a DNA fragment comprising an open reading frame flanked 

15 by an end generated by a second restriction enzyme that generates a 3* TA 
overhang and an end generated by a third restriction enzyme which has 
infrequent restriction sites in cDNAs or open reading frames from at least one 
species and generates blunt ends. 

20 79. A support comprising a plurality of recombinant vectors, one or more of 
which comprise a different open reading frame, wherein each recombinant 
vector is prq)ared by digesting a vector comprising a SgfL recognition site 5' to a 
recognition site for a first restriction enzyme which generates blunt ends, with 
Sgfl and the first restriction enzyme and ligating the digested vector to a DNA 

25 fragment comprising an open reading frame flanked by an end generated by a 
second restriction enzyme that generates a 3* TA overhang and an end generated 
by a third restriction enzyme which has infrequent restriction sites in cDNAs or 
open reading frames from at least one species and generates blunt ends. 

30 80. The support of claim 79 which a multi-well plate, the wells of which 
optionally each comprise a different recombinant vector. 

8 1 , The support of claim 79 wherein the different open reading frames 
include open reading frames having nucleotide substitutions of a selected open 
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reading frame which different open reading frames are prepared by mutatgenesis 
of the selected open reading fi:ame. 

82. A method to prepare a support comprising a plurality of recombinant 
5 vectors or recombinant cells, comprising: 

a) selecting a plurality of recombinant vectors or recombinant cells 
comprising recombinant vectors, wherein two or more of the recombinant 
vectors comprise an open reading frame for a different polypeptide, wherein at 
least one recombinant vector conqprises a promoter and a first open reading 
10 frame which is flanked by two exchange sites, whaein the exchange sites are 
formed by ligation of 

a vector oon^rising flie promoter which is 5' to a recognition site for a 
first restriction enzyme that generates a 3' TA oveAang, which is 5* to a . 
recognition site for a second restriction enzyme which generates blunt ends, 
15 which vector is digested with the first and second restriction enzymes, and 

a DNA sequence comprising the first open reading frame flanked by an 
end generated by Sgfl and an end generated by a third restriction enzyme which 
has infrequent restriction sites in cDNAs or open reading frames &om at least 
one species and generates blunt ends; and 
20 b) introducing the selected recombinant vectors or recombinant cells to 

one or more receptacles of the support 

83. A method to prepare a plurality of mutagenized recombinant vectors, 
comprising: 

25 a) providing DNAs comprising a plurality of mutagenized open reading 

fi:ames flanked by a Sgfl recognition site and a site for a first restriction 
enzyme which has infi^uent restriction sites in cDNAs or open reading 
fi^es &om at least one species and genwates blunt aids; and 
b) digesting the DNAs with Sgfl and the first restriction enzyme and 

30 Ugating the digested DNAs to a veQtor comprising a promoter which is 5' 

to a recognition site for a second restriction enzyme that generates 3' TA 
overhangs which is 5' to a recognition site for a third restriction enzyme 
which generates blunt ends, which vector is digested with the second and 



100 



wo 2005/087932 



PCT/US2004/031912 



third restriction enzymes, to yield a plurality of mutagenized recombinant 
vectors. 

84. A method to prepare a plurality of mutagenized recombinant vectors, 
5 comprising: 

a) providing DNAs comprising a plurality of mutagenized open reading 
frames flanked by a recognition site for a first restriction enzyme that 
generates a 3' TA overhang and site for a second restriction enzyme 
which has infrequent restriction sites in cDNAs or open reading fi:ames 

10 from at teast one species and generates blunt ends; and 

b) digesting the DNAs with the first and second restriction enzymes and 
ligating the digested DNAs to a vector comprising a promoter which is 5' 
to a Sgfi recognition site which is 5' to a recognition site for a third 
restriction enzyme which generates blunt ends, which vector is digested 

1 5 with Sgfl and the third restriction enzyme, to yield a plurality of 

mutagenized recombinant vectors. 

85. The method of claim 84 wherein the first restriction enzyme is Pmel, 

20 86, A library of recombinant cells comprising recombinant vectors or a 
library of recombinant vectors, two or more of which recombinant vectors 
comprise an open reading firame for a different polypeptide, wherein at least one 
recombinant vector comprises a promoter and a first open reading fi:ame which 
is flanked by two exchange sites, wherein the exchange sites are formed by 

25 ligation of 

a vector comprising the promoter which is S' to a recognition site for a 
first restriction enzyme that generates a 3' TA overhang which is 5' to a 
recognition site for a second restriction enzyme which generates blunt ends, 
which vector is digested with the first and second restriction enzymes, and 
30 a DNA sequence comprising the first open reading frame flanked by an 

end generated by Sg/l and an end generated by a third restriction enzyme which 
has infrequent restriction sites in cDNAs or open reading frames from at least 
one species and generates blunt ends. 
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87. A library of recombinant cells comprising recombinant vectors or a 
library of recombinant vectors, a plurality of which comprise mutagenized 
recombinant vectors comprising mutagenized open reading frames of a selected 
open reading frame, wherein at least one mutagenized recombinant vector 

5 comprises a promoter and a mutagenized open reading frame which is flanked 
by two exchange sites, wherein the exchange sites are foraied by ligation of 

a vector comprising the promoter which is 5' to a recognition site for a 
first restriction enzyme that generates a 3' TA overhang which is 5* to a 
recognition site for a second restriction ^izyme which generates blunt ends, 
10 which vector is digested wifli the first and second restriction enzymes, and 

a DNA sequence comprising the mutagenized open reading frame 
flanked by an end generated by Sgfi and an end generated by a third restriction 
enzyme which has infrequent restriction sites m cDNAs or open reading frames 
fix}m at least one species and generates blunt ends. 

15 

88. A method to introduce at least two recognition sites for at least two 
difiisrent restriction enzymes to the ends of an open reading frame, comprising: 

a) providing one or more nucleic acid sequences each comprising an 
open readmg frame; and 

20 ' b) amplifying each nucleic acid sequence with at least a pair of 

oligonucleotides to yield amplified nucleic acid comprising sequences in the pair 
of oligonucleotides, wherein the pair of oligonucleotides has sequences which 
anneal to sequences in the one or more open reading frames, wherein the 
sequences in the amplified nucleic acid corresponding to sequences in one of the 

25 pair of oligonucleotides comprise a restriction enzyme site for Sgfi which is 5 ' to 
the sequences which anneal to the open reading frame, wherein the sequences in 
the amplified nucleic acid corresponding to sequences in the other of the pair 
comprises a restriction enzyme site for a first restriction enzyme which has 
infrequent restriction sites in cDNAs or open reading fi:ames from at least one 

30 species and generates blunt ends, which first restriction enzyme site is 3 ' to the 
sequences which anneal to the open reading frame, and wherein the sequences in 
the amplified nucleic acid corresponding to sequences in the oligonucleotides are 
capable of bemg digested with Sgfl and tiie first restriction enzyme. 
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89. The method of claim 88 further comprising adding an adenine to the 3 ' 
ends of the amplified nucleic acid to yield a modified amplified nucleic acid 
fragment. 

S 90. The method of claim 89 further comprising ligating the modified 

amplified nucleic acid firagment to a DNA fragment having a 5' T overhang to 
yield a recombinant vector. 

91. The method of claim 88 wherein the pair of oligonucleotides further 
10 comprise a topoisomemse I binding site at the 5' end of the oligonucleotide. 

92. The method of claim 9 1 further comprising ligating the amplified nucleic 
acid fragment to a DNA fragment having blunt ends in the presence of 
topoisomerase I, to yield a recombinant vector. 

15 

93. The method of claim 88 fiirther comprising digesting the amphfied 
nucleic acid with Sgfl and the first restriction enzyme and ligating the digested 
amplified nucleic acid to a DNA fragment having a blunt end and an end which 
is capable of ligation to an end generated by iS^^, to yield a recombinant vector. 

20 

94. The method of claim 93 wherein the amplified nucleic acid is purified 
prior to digestion. 

95 . The method of claim 93 wherein the amplified nucleic acid is purified 
25 after digestion and prior to ligation. 

96. The method of claim 88 wherein the nucleic acid sequence is cDNA. 

97. The method of claim 88 wherein the nucleic acid sequence is KNA. 

30 

98. The method of claim 88 wherein the one oligonucleotide of the pair 
which comprises the Sgfi site includes an ATG 3 ' to the SgfL site which is in- 
fraihe with the open reading frame. 
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99. The method of claim 88 wherein two or more different nucleic acid 
sequences are anq>lified 

100. The method of claim 90, 92 or 93 further comprising transforming cells 
5 with the recombinant vector to yield recombinant cells. 

101. Recombinant cells prepared by the method of claim 100. 

1 02. The library of claun 86 or 87 wherein the at least o^e recombinant vector 
1 0 comprises a further open reading fiame flanked by two exchange sites, wherein 

the exchange sites are formed by ligation of 

the recombinant vector which comprises a recognition site for a fourth 

and a fifth restriction enzyme site 3* to the recognition site for the restriction 

enzyme which generate blunt ends, wherein the fourth restriction enzyme 
15 generates a 3* TA overhang and is different than the first restriction enzyme, and 

wherein the fifth restriction enzyme generates blunt ends, which vector is 

digested with the fourth and fifth restriction enzymes, and 

a DNA sequence comprising the further open reading frame flanked by 

an end generated by Sgfi and a sixth restriction enzyme which has infrequent 
20 restriction sites in cDNAs or open reading frames from at least one species and 

generates blunt ends. 

1 03 . A vector comprising a first open reading frame which includes a Pmel 
recognition site and is flanked at the S' end by a recognition site for a first 

25 restriction enzynie that generates complementary single-strand DNA overhangs, 
which vector, once digested with Pmel and the first restriction enzyme, and 
ligated to a DNA firagment comprismg a blunt end at the 5 ' end of a second open 
reading frame and an end generated by a second restriction enzyme which 
generates single-strand DNA overhangs which are complementary to the single- 

30 strand DNA overhangs generated by the first restriction enzyme, yields a 

recombinant vector comprising a third open reading frame comprising the first 
and second open reading frames. 
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1 04. The vector of claim 1 03 wherein the third open reading frame includes 
N1N2N3GTTTN4N5R (SEQ ID NO:72), wherein N1N2N3 and TN4N5 are codons 
that do not code for a stop codon, and wherein R is one or more codons. 

5 105. The vector of claim 103 wherein the blunt end of the DNA fragment is 
generated by a restriction enzyme other than PmeL 

106. The vector of claim 103 wherein the blunt end of the DNA fragment is 
generated by Pmel digestion. 

10 

1 07. A vector comprising a first open reading frame which includes a Pmel 
recognition site and is flanked at the 5 ' end by site for a first restriction en2yme 
that generates complementary single-strand DNA overhangs, which vector, once 
digested with Pmel and the first restriction enzyme, and ligated to a DNA 

1 5 fragment comprising a blunt end and an end generated by a second restriction 
enzyme which generates single-strand DNA overhangs which are 
complementary to the single-strand DNA overhangs generated by the first 
restriction enzyme, yields a recombinant vector which includes 
N1N2N3GTTTN4N5, wherein N1N2N3GTTT is a sequence from the 3* end of the 

20 digested expression vector, wherein N1N2N3 do not code for a stop codon, and 
wherein N4 and N5 = A, or N4 = A and N5 = G or N4 = G and N5 = A. 

108. The vector of claim 107 wherein the blunt end of the DNA fragment is 
generated by Pmel digestion. 

25 

1 09. The vector of claim 1 07 wherein the blunt end of the DNA fragment is 
generated by a restriction enzyme other than Pmel. 

110. A support comprising a plurality of recombinant vectors, two or more of 
30 which comprise an open reading frame for a different polypeptide, wherem at 

least one recombinant vector comprises a promoter and a first open reading 
frame comprising a second open reading frame and one or more codons which 
are in-frame with the second open reading frame, wherein the second open 
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reading frame is flanked by two exchange sites, wherein the exchange sites are 
formed by ligation of 

a DNA sequence comprising the second open reading frame which 
includes a Pmel recognition site and is flanked at the 5 ' end by a recognition site 
5 for a first restriction enzyme that generates complementary single-strand DNA 
overhangs, which DNA sequence is digested with Pmel and the first restriction 
enzyme, and 

a vector comprising a blunt end at the 5' end which is 5' to the one or 
more in-frame codons and tiie promoter which is 5' to an end generated by a 
10 second restriction enzyme which generates single-strand DNA overhangs which 
are complementary to the single-strand DNA overhangs generated by the first 
restriction oizyme. 

111. The support of claim 1 10 wherein the exchange site formed by blunt end 
15 Ugation includes N,N2N3GTTIN4N5, wherein NiNaNjGTTT is a sequence from 
the 3' end of the DNA sequence, wherein if N1N2N3 do not code for a stop 
codon, N4 and N5 = A, or N4 = A and N5 = G or N4 = G and Ns = A. or wherein 
N1N2N3 code for a stop codon. 

20 112. A method to prepare a support comprising a plurality of recombinant 1 
vectors or recombinant cells, comprising; 

a) selecting a pluraUty of recombinant vectors or recombinant cells 
comprising recombinant vectors, wherein two or more of the recombinant 
vectors comprise an open reading frame for a different polypeptide, wherein at 
least one recombinant vector comprises a promoter and a first open reading 
frame comprising a second open reading frame and one or more codons which 
are in-fi:ame with the second open reading fiame, wherein the second open 
reading frame is flanked by two exchange sites, wherein the exchange sites are 
formed by ligation of 
30 a DNA sequence comprising the second open reading frame which 

includes a Pmel recognition site and is flanked at the 5 ' end by a recognition site 
for a first restriction enzyme that generates complementary single-strand DNA 
overhangs, which DNA sequence is digested with Pmel and the first restriction 
enzyme, and 
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a vector comprising a blunt end at the 5' end which is 5 ' to the one or 
more codons and the promoter which is S' to an end generated by a second 
restriction enzyme which generates single-strand DNA overhangs which are 
complementary to the single-strand DNA overhangs generated by the first 
5 restriction enzyme; and 

b) introducing the selected recombinant vectors or recombinant cells to 
one or more receptacles of the support. 

113. A library of recombinant cells comprising recombinant vectors or a 
10 library of recombinant vectors, two or more of which recombinant vectors 

comprise an open reading firame for a different polypeptide, wherein at least one 
recombinant vector comprises a promoter and a first open reading fi*ame 
comprising a second open reading fi*ame and one or more codons which are in- 
fi-ame with the second open reading fi-ame, wherein the second open reading 
1 5 fi-ame is flanked by two exchange sites, wherein the exchange sites are formed 
by ligation of 

a DNA sequence comprising the second open reading fi-ame which 

i 

includes a Pmel recognition site and is flanked at the 5' end by a recognition site 
for a first restriction enzyme that generates complementary single-strand DNA 
20 overhangs, which DNA is digested with Pmel and the first restriction enzyme, 
and 

a vector comprising a blunt end at the 5 ' end which is 5 ' to the one or 
more in-fi-ame codons and the promoter which is 5' to an end generated by a 
second restriction enzyme which generates single-strand DNA overhangs which 
25 are complementary to the single-strand DNA overhangs generated by the first 
restriction ^izyme. 

1 14. A library of recombinant cells comprising recombinant vectors or a 
library of recombinant vectors, a plurality of which recombinant vectors 

30 comprise mutagenized recombinant vectors comprising mutagenized open 
reading firames of a selected open reading frame, wherein at least one 
recombinant vector comprises a promoter and an open reading frame comprising 
a mutagenized open reading frame and one or more codons which are in-frame 
with the mutagenized open reading frame, wherein the mutagenized open 
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reading frame is flanked by two exchange sites, wherein the exchange sites are 
foimed by ligation of 

a DNA sequence comprising Ihe mutagenized open reading frame which 
includes aPmel recognition site and is flanked at the 5' end by a recognition site 
5 for a first restriction enzyme that generates complementary single-strand DNA 
overhangs, which DNA sequence is digested with Pmel and the first restriction 
enzyme, and 

a vector comprising a blunt end at the 5' end which is 5' to the one or 
more in-frame codons and the promoter which is 5' to an end generated by a 
10 second restriction enzyme which generates single-strand DNA oveAangs which 
are complementary to the single-strand DNA overhangs generated by the first 
restriction enzyme. 

115. A support comprising a pluraUty of recombinant vectors, two or more of 
15 which comprise an open reading frame for a different polypeptide, wherein at 
least one recombinant vector comprises a promoter and an open reading frame 
which is flanked by two exchange sites, wherein the exchange sites are formed 
by ligation of 

a DNA sequence comprising the open reading frame which is flanked by 
20 at least two restriction enzyme sites for a first restriction enzyme which is a 

hapaxoterministic restriction enzyme, which DNA sequence is digested with the 
first restriction enzyme to generate a first DNA fragment flanked by a first pair 
of non-self complementary single-strand DNA overiiangs, and 

a vector comprising the promote and non-essential DNA sequences fliat 
25 are flanked by two restriction enzyme sites for a second restriction enzyme 
which is a hapaxoterministic restriction enzyme, which vector is digested with 
the second restriction enzyme to generate a second DNA fragment which lacks 
non-essential DNA sequences and is flanked by a second pair of non-self 
complementary single-strand DNA overhangs, wherein each of the second pair 
30 of the non-self-complementary DNA overhangs is complementary to only one of 
the single-strand DNA overiiangs of the first pair of non-self complementary 
singje-strand DNA overhangs. 
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116. The support of any one of claims 73 to 77, 1 10 to 1 1 1 or 1 14 which is 
multi-well plate, 

117. The support of any one of claims 73 to 77, 1 10 to 1 1 1 or 1 14 wherein the 
5 plurality of recombinant vectors each encode a different polypeptide from the 

same organism. 

118. The support ofany one ofclaims 73 to 77, llOto 111 or 114 wherein the 
plurality of recombinant vectors encode orthologous polypeptides. 

10 

1 19. The support ofany one of claims 73 to 77, 1 10 to 1 1 1 or 1 14 wherein the 
plurality of recombinant vectors encode paralogous polypeptides. 

120. A method to prepare a support comprising a plurality of recombinant 
15 vectors or recombinant cells, comprising: 

a) selecting a plurality of recombinant vectors or recombmant cells 
comprising recombinant vectors, wherein two or more of the recombinant 
vectors comprise an open reading frame for a different polypeptide, wherein at 
least one recombinant vector comprises a promoter and an open reading frame 
20 which is flanked by two exchange sites, wherein the exchange sites are formed 
by ligation of 

a DNA sequence comprising tiie open reading frame which is flanked by 
at least two restriction enzyme sites for a first restriction enzyme which is a 
hapaxoterministic restriction enzyme, which DNA sequence is digested with the 

25 first restriction enzyme to generate a first DNA firagment flanked by a first pair 
of non-self complementary single-strand DNA overhangs, and 

a vector comprising the promoter and non-essential DNA sequences that 
are flanked by two restriction enzyme sites for a second restriction enzyme 
which is a h25)axoterministic restriction enzyme, which vector is digested with 

30 the second restriction enzyme to generate a second DNA fragment which lacks 
non-essential DNA sequences and is flanked by a second pair of non-self 
complementary single-strand DNA overhangs, wherein each of the second pair 
of the non-self-compleraentary DNA overhangs is complementary to only one of 
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the single-strand DNA overhangs of the first pair of non-self complementary 
single-strand DNA overhangs; and 

b) introducing the selected recombinant vectors or recombinant cells to 
one or more recq>tacles of the siq)port. 

5 

121. The method of any one of claims 82, 1 12 or 120 wherein each of the 
selected recombinant vectors encodes a different paralogous protein. 

122. The method of any one of claims 82, 1 12 or 120 wherein each of the 

10 selected recombinant vectors encodes a different proteiii in a catabolic pathway. 

123. The method of any one of claims 82, 1 12 or 120 wherein each of the 
selected recombinant vectors encodes a different protein in a biosynthetic 
patiiway. 

15 

124. The method of any one of claims 82, 112 or 120 wherein each of the 
selected recombinant vectors encodes a different protease. 

125. The method of any one of claims 82, 1 12 or 120 wherein each of the 
20 selected recombinant vectors encodes a protein fioin the same organism. 

126. The method of any one of claims 82, 1 12 or 120 wherein each of the 
selected recombinant vectors encodes orthologous proteins. 

25 127. A method to prepare a pluraUty of mutagenized recombinant vectors, 
comprising: 

a) providing DNAs comprising a pluraUty of mutagenized open reading 
frames flanked by two restriction enzyme sites for a first restriction enzyme 
which is a h^axotetministic restriction enzyme and generates a first pair of non- 
30 self complementary single-strand DNA overhangs; and 

b) digesting the DNAs with the first restriction enzyme and Ugating the 
digested DNAs to a vector comprising a promoter and non-essential DNA 
sequences flanked by two restriction enzyme sites for a second restriction 
enzyme which is a hapaxoterministic restriction enzyme, which vector is 
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digested with the second restriction enzyme generating a DNA fragment which 
lacks non-essential DNA sequences but comprises a second pair of non-self 
complementary single-strand DNA overhangs, wherein each of the second pair 
of the non-self-cbmplementary DNA overhangs is complementary to only one of 
S the smgle-strand DNA ove±angs of the first pair of non-self complementary 
single-strand DNA ovegrhangs, to yield a plurality of mutagenized recombinant 
vectors. 

128. A support comprising a plurality of mutagenized recombinant vectors 
10 prepared by the method of any one of claims 83, 84, or 127. 

129. A library of recombinant cells con^rising recombinant vectors or a 
library of recombinant vectors, two or more of which recombinant vectors 
comprise an open reading frame for a different polypeptide, wherein at least one 

1 5 recombinant vector comprises a promoter and an open reading frame which is 
flanked by two exchange sites, wherein the exchange sites are formed by Ugation 
of 

a DNA sequence comprising the open reading frame which is flanked by 
at least two restriction enzyme sites for a first restriction enzyme which is a 

20 hapaxotemiinistic restriction enzyme, which DNA sequence is digested with the 
first restriction enzyme to generate a first DNA fragment flanked by a first pair 
of non-self complementary single-strand DNA overhangs, and 

a vector comprising the promoter and non-essential DNA sequences that 
are flanked by two restriction enzyme sites for a second restriction enzyme 

25 which is a hapaxoterministic restriction enzyme, which vector is digested with 
the second restriction enzyme to generate a second DNA fragment which lacks 
non-essential DNA sequences and is flanked by a second pair of non-self 
complementary single-strand DNA overhangs, wherein each of the second pair 
of the non-self-complementary DNA overhangs is complementary to only one of 

30 the single-strand DNA overhangs of the first pair of non-self complementary 
single-strand DNA overhangs. 

1 30. A library of recombinant cells comprising reco^mbinant vectors or a 
library of recombinant vectors, a plurality of which recombinant vectors 
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comprise mutagenized recombinant vectors comprising open reading frames of a 
selected open reading frame, wherein at least one recombinant vector comprises 
a promoter operably linked to the mutagenized open reading frame which is 
flanked by two exchange sites, wherein the exchange sites are formed by ligation 
5 of 

a DNA sequence comprising the mutagenized open reading frame which 
is flanked by at least two restriction enzyme sites for a first restriction enzyme 
which is a hapaxoterministic restriction etizyme, which DNA sequence is 
digested with the first restriction enzyme to generate a first DNA fragment 
10 flanked by a first pair of non-self complmientary single-strand DNA overhangs, 
and 

a vector comprising the promote: and non-essential DNA sequences that 
are flanked by two restriction enzyme sites for a second restriction enzyme 
which is a hapaxoterministic restriction enzyme, which vector is digested with 

15 the second restriction enzyme to gen^ate a second DNA fragment which lacks 
non-essential DNA sequences and is flanked by a second pair of non-self 
complementary single-strand DNA overhangs, wherein each of the second pair 
of the non-self-complementary DNA overhangs is complementary to only one of 
the single-strand DNA overtangs of the first pair of non-self complementary 

20 single-strand DNA overhangs. 

131. A method for performing genetic analysis, comprising: 

a) populating a database of genetic data with a plurality of genetic 
records; 

25 b) querying the database of genetic data to identify a first subset of 

genetic records, wherein each record has at least one recognition site for one 
predetermined restriction enzyme or for restriction enzymes included in a set of 
predetermined restriction enzymes; and 

c) determining a set of statistics associated with the restriction 

30 enzyme recognition sites for at least a second subset of genetic records in the 
first subset 



132. The method of claim 131 wherein determining the set of statistics 
includes determining a number of genetic records including recognition sites for 

112 



wo 2005/087932 



PCT/US2004/031912 



one predetermined restriction enzyme or for each of the predetermined 
restriction enzymes in the set 

133 . The method of claim 13 1 wherein detennining the set of statistics 

5 includes determining a number of occurrences of at least one site for the one 
predetermined restriction enzyme or for the predetermined restriction enzymes m 
a genetic record in the second subset. 

134. The method of claim 131 wherein the genetic records comprise nucleic 
10 acid sequences. 

135. The method of claim 131 further comprising filtering the subset of 
genetic records to include or exclude genetic records having one or more 
selected characteristics. 

15 

136. The method of claim 131 further comprising filtering the subset of , 
genetic records to exclude genetic records having a size greater than a 
predetermined value. 

20 137. The method of claim 136 wherein the predetermined value is 21000 
characters. 

138. The method of claim 131 fiirther comprising determining the sequence of 
specific bases which are present as ambiguous bases within a recognition site or 

25 which are present between a recognition site for a restriction enzyme and the 

position at which the restriction enzyme cleaves DNA containmg the recognition 
site. 

139. The method of claim 131 wherein at least one of the restriction enzymes 
30 has a 6 bp, 7 bp or 8 bp recognition site. 

140. The method of claim 131 wherein at least one of the restriction enzymes 
is ahapaxotemainistic restriction enzyme. 
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141 , A computerized system for genetic analysis, comprising: 
a database of genetic data; 
a processor; 

a set of one or more programs executed by the processor causing the 
5 processor to; 

query the database of genetic data to identify a first subset of 

genetic records, wh^ein each record has at least one recognition site for 

one predetermined restriction enzyme or for restriction enzymes included 

in a set of predetermined restriction enzymes, and; 
0 determine a set of statistics associated with the restriction enzyme 

recognition sites for at least a second subset of genetic records in the first 

subset 



114 



wo 2005/087932' 



PCT/US2004/031912 



1/30 



TWO TYPES OF HAPAXOMERS 

• INTERNAL PALINDROMIC TYPE II ENZYMES 
-e.g., Sfi I 

GGCCNNNN'^NGGCC 
CCGGN'^NNNNCCGG 

• OUTSIDE CUTTERS (TYPE IIS) 
-e.g., Sap I ' 

GCTCTTCN'^NNI} 
CGAGAAGNNNN"^ 

FIG. 1 



CAGNNNCTG 

OiONNNDVO 
i 

FIG. 2A 



GGATGNNNNNNNNNNNNN 
CCTACNNNNNNNNNNNNN^ 

NNNNNNNNNNNNNCATCC 
NNNNpNNNNNNNGTAGG 

FIG. 2B 



wo 2005/087932 



PCT/US2004/031912 



2/30 



300 

\ 

302 



POPULATE DATABASE 

1 J^' 
QUERY TO CREATE SUBSET COMPRISING RECORDS HAVING AT 
LEAST ONE RECOGNmON SITE FOR A RESTRICTION ENZYME 

RLTER SUBSET 



308 



DETERMINE STATISTICS FOR RECORDS 



310 



DETERMINE NUMBER OF 
RECORDS HAVING EACH 
OF A SET OF RESTRICTION 
ENZYME TARGET SITES 



DETERMINE NUMBER 
OF RESTRICTION SITES 
IN RECORD 



1 J'' 

DETERMINE STATISTICS FOR 

AMBIGUOUS POSITIONS 
OF RESTRICTION ENZYMES 



FIG. 3 



wo 2005/087932 



PCT/US2004/031912 



3/30 



CM O CO o> o> CO 

<^ OO CO lO ^ CM CD 

o ^ CD o> cx> 

^ a> Gl!> <J> 0> (D (9> 

CL 
(0 
CO 

^ fs- -t- N. 00 CO 

T CO CO CO CO CO CO 

O 3: Q 00 O CO CO 

OO CI) CD O) 0> 0> 

CL 
CO 

jr* CM CO CO CM CO 

r*. CD C7> C9 CD 



CD O ID O 

av> CO 00 CO GO 

CO 
CO 



— in cs »o 

<D rf in lo T- lo 

_ 58 fa S fe 
(L (]> 01 a> w CD o> 

;r;^T- o o o o o 

*V C3> O O O O 

o 00 a> o o o o 

"I' C» CD O O O O 

§ ---- 



6 



_ 00 O 
. O O) O) o 

O to O 0> 0> O 

wa> a> o o» o> o 



(0 



CO c\i o «2 <p 



o 

IP fl^ 00 

CO 



C7> a> 



o Si? S r 53 Ss! 52 
5i o> o> o CO 

a.£§ S oS S S ^ 
a 

(0 



CO 

X S IIJ O (O 



o 

J3 



wo 2005/087932 



PCT/US2004/031912 



4/30 



7+ Cutters 



Enzymes 


Recognition Sequence 


HsFna 


MGC 


Ec 


Ce 


Sc 


At 


Aaii 


CACCTGCNNNN*NNNN 


7142 


5355 








1138 


Abel 


CC'TCA GCnotavailaUe 


7970 


5836 


141 


90 


374 


1833 


AscI 


GG*CGCG CC 


515 


336 


152 


10 


13 


26 


AsiSI 


GCG AT*CGC 


108 


62 


207 


39 


29 


178 


BbvCI 


CCTCA GC 


7970 


5836 


141 


90 


374 


^833 


CdNI 


GCKBGCC GC 


1444 


823 


19 


33 


31 


97 


Cpol 


CG'^GWC CG 


1119 


781 


347 








Cspl 


CG^GWC CG 


1119 


781 


347 








CspBI 


GC^GCC GCnotavaDabte 


1444 


823 


19 


33 


31 


97 


Fsel 


GG CCGG*CC 


1139 


740 


5 


9 


10 


70 


MabI 


A'<!CWGG_T 














MchAI 


GC^GCC GC not available 


1444 


823 


19 


33 


31 


97 


MIU1106I 


/?GGlVCCy not available 














NotI 


GC*G6CC GC 


1444 


823 


19 


33 


31 


97 


Pad 


TTA AT*TAA 


708 


395 


66 


8 


213 


138 


pnzn 


RGf^WC CY not available 














PpuMI 


RG'^GWC CY 














PpuXI 


RG^GWC CY 














PspSIf 


RG^GWC CY 














PspPPI 


RG^WC CY 














Rsrll 


CG'^WC CG 


1119 


781 


347 








Rsr2l 


CG^GWC CG 


1119 


781 


347 








SanDl 


6G*6iyC CC 














Sapl 


GCTCTTCI^NNN 


7260 


4785 


584 


1296 


1362 


8870 


Sbfl 


CC TGCA'KSG 


2591 


1802 


60 


13 


66 


251 


Sdal 


CC TGCA*GG 


2591 


1802 


60 


13 


66 


251 


Sdil 


GGCCN NNNtlGGCCnotavsOiable 


2214 


1634 


28 


18. 


54 


121 


SexAl 


A'^CCWGG T 














sm 


GGCCN NNN^'NGGCC 


2214 


1634 


28 


18 


54 


121 


Son 


GCG AT*CGC 


108 


62 


207 


39 


29 


178 


SqrAI 


CR*CCGG YG 














Sse232f 


CG^CCGG CG not available 


708 


448 


29 


43 


23 


446 


Sse1825l 


GG'^GWC CCnotavanable 














Sse83a7l 


CC TGCA*GG 


2591 


1802 


60 


13 


66 


251 


^686471 


AG'^WC CTnot available 














VpaK32l 


GCTCTTCN^N not available 


7260 


4785 


584 


1296 


1362 


8870 



\ 



FIG. 5A 
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0RUG1 




DRUG2 




H-RE, Lig 
FIG. 6A 



DRUG2 




DRUG1 




DRUG2 




H-RE, Lig 
FIG. 6B 



DRUG2 




DRUG1 




LETHAL GENE 



DRUG2 




H-RE, Lig 
FIG. 6B 



DRUG2 
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Sfi 

• HOW TO MAKE Sfi I "ONE WAY" 
-METHYLASES 

-Bgl, NOT Sfi I SITS, IN ACCEPTOR VECTORS 

GGCCNNNN^NGGCC 
CCGGN^NNNNCCGG 

GCCNNNN^NGGC 
CGGN^NNNNCCG 
-LETHAL GENES IN STOPFER FRAGMENTS 

FIG. 7 
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00=00 



PGR 



Op0====OO=OO==^^ 



ooo 

B 



ooo 



RecA COATED OLIGOS 



O0D==OO=OO======OCO 



M. Hae III + SAM 



ooo=##=^o=oco 



WIZARD 



DRUG1 




O^Bql I, LIGASE ^ ^^^^'^ 

B DRUG1 




FIG. 9A 
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PCR 



B' 



<K>==<X>=<X>==<X> 



000 

B 



000 



RecA COATED OLIGOS 



ocx>==<><>--c><>=^^ 



I. Sfi I + SAM 



ocx> 



<XX) 



WIZARD 



DRUG1 




3## <X> Sfi I, LIGASE 

B DRUG1 




FIG. 9B 
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DRUG1 




ooo 

B 



ooo 



RecA 'COATED OLIGOS- 



DRUG1 




M. Hae III + SAM 



DRUG1 




WIZARD 



Bgl I CUT 



DRUG1 




DRUG2 




Bgl I, LIGASE 



DRUG2 




FIG. 10A 
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DRUG1 




RecA COATED OLIGOS 



DRUG1 




M. Sfi 1 + SAM 



DRUG1 




WIZARD 



Bgl I CUT 



DRUG1 




DRUG2 




Sfi I. LIGASE 
FIG. 10B 
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REs THAT CAN MAKE Sfi I ONE-WAY 



3' 3b 

UVJljKn/iiMb 


RESTICTION 

tjN/jl nil) 


RECOGNITION 

QT?nTTl?MPTP 
oJiyU£jiNL£j 


CNG 


Fmul 


G GNCC 


CNG, 


PssI 


RG_GNC^CY 


CWG 


Psp03I 


G_GWC''C 


GNC . 


BthCI 


G_CNG''C 


GSC 


Taul 


G CSG^'C 


NNN 


AlwNI 


CAG NNN^CTG 


NNN 


Bgll 


GCCN NNN^NGGC 


NNN 


BsiYI 


CCNN NNN^NNGG 


NNN 


BstAPI 


GCAN NNN^NTGC 


NNN 


Drain 


CAC NNN^GTG 


NNN 


Mwol 


GCNN NNN^NNGC 


NNN 


PflMI 


CCAN NNN^NTGG 


NNN 


RleAI 


CCCACANNNNNNNNN NNN^ 


NNN 


Sfil 


GGCCN NNN'^NGGCC 



FIG. 11 
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Sap I 

• HOW TO MAKE Sap I "ONE WAY" 
-METHYLASES 

-ORIENTATION OF SITES IN VECTOR BACKBONE 
IN DONOR VECTOR AND IN ACCEPTOR VECTOR 

-LETHAL GENES IN STUFFER FRAGMENTS 

-Ear I, NOT Sap I SITES, 
IN ACCEPTOR VECTORS 

GCTCTTCN^NNN 
CGAGAAGNNNN^ 

CTCTTCN^NNN 
GAGAAGNNNN^ 

. KEY ADVANTAGE OF Sap I 
-ONLY THREE BASES PER EXCHANGE SITE 
LEFT IN ACCEPTOR VECTOR , 



FIG. 13 
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TWO ENZYME APPROACH 

• Sgf I - 

CUTTER OF HUMAN cDNAs, TWO BASE 3' 
OVERHANG 

G C G A T^C G C 
C G C'T A G C G 

• Pme I - 

CUTTER, BLUNT END CUTTER 

G T T T^A A A C 
C A A A'^T T T G 

FIG. 14A 




FIG. 14B 
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<hAT 



0= 



OH 



PGR 




DRUG1 



Sfi I. LIGASE 



Pvu I OR Pac I NOT Pme 

AT 

Sgf IJPme I 

ATp:====tl_ VECTOR 

''donor^ ''''' 
VECTOR ^ Sgf I, Pme I, LIGASE 

DRUGI" 



0RUG2 




FIG. 15 
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N-TERMINAL Sgf I SITE CAN ALLOW 
N-TERMINAL FUSION OR NO FUSION 

NAAGGAGCGATCGCCATGg 
~RBS- Kozak— 

VAAGGAGCGATCGCCATG 
KEQGlyAlAIleAlaMet 

FIG. 16 



C-TERMINAL Pme I SITE ALLOWS TERMINATION 
(+1AA) OR C-TERMINAL FUSIONS 

.NNNGTTTAAACN 
XaaValTer 

NNNGTTTATCN with EcoRV 
XaaValTyr 

; NNNGTTTCCAN with Ball, etc. 
XaaValSer 

FIG. 17 
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Swa I. , LIGASE 
FIG. 18 
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N-TERMINAL Pac I— Sgf I FUSION SITE 

NAAGGA^AATCGCCATGg 
KEQGlyLeuIleAlaMet 

C-TERMINAL Pme I~Swa I FUSION SITE 

NNNGTTTAAATN 
XaaValTer " 

FIG. 19A 



N-TERMINAL Pac I— Sgf I FUSION SITE 

NAAGGA^IAATCGCCATGg 
— RBS Kozak— 

C-TERMINAL Pme I— Swa I FUSION SITE 

NNNGTTTAAATN 
XaaValTer 

FIG. 19B 
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Sgf I Sgf I 




FIG. 20A 
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FIG. 20B 
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1500 




HOURS AT 37" 

FIG. 22A 



80 









□ E6/DE3 

0E6/AI 

^E6/RX 



















0 

PRE-INDUCTION 



FIG. 22B 
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SEQUENCE LISTING 

<110> Promega Corporation 
5 Slater, Michael R. 

Wood, Keith V. 
Hartnett, James Robert 

<120> Vectors for Directional Cloning 

10 

<130> 341.030WO1 

<150> 10/702,228 
15<151> 2003-11-05 

<150> 10/678,961 
<151> 2003-10-03 

20<160> 92 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
25<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

30<223> A synthetic DNA fragment 
<220> 

<221> misc_f eature 
<222> 18 
35<223> n = A, T, G, or C 

<400> 1 

aaggagcgat cgccatgn 18 

40<210> 2 

<211> 10 
<212> DNA 

<213> Artificial Sequence 
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2 

<220> 

<223> A synthetic DNA fragment, wherein nnn is the first codon which is 3' 
to the start codon followed by the remainder of an open reading 
frame 

5 

<220> 

<22l> misG_feature 
<222> 8-10 

<223> n = A, T, G, or C 

10 

<400> 2 

cgccatgnnn 10 

<210> 3 
15<211> 12 
<212> DNA 

<213> Artificial Sequence 



20<220> 

<223> A synthetic DNA fragment 
<220> 

<221> mis cofeature 
25<222> 1-6 

<2i23> n = A, T, G, or C 

<400> 3 

nnnnnngtct tc 12 

30 

<210> 4 
<211> 10 
<212> DNA 

<213> Artificial Sequence 

35 

; <220> 

<223> A synthetic DNA fragment 
<220> 

40<221> misc_feature 

<222> 1-4 

<223> n = A, T, G, or C 



wo 2005/087932 



PCT/US2004/031912 



<400> 4 

nnnngaagag 10 

<210> 5 
5<2X1> 13 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> A synthetic DNA fragment 
<220> 

<221> mis cofeature 
<222> 6-13 
15<223> n = A, T, C, or G 

<400> 5 

gcagcnnnzin nnn 13 

2d<210> 6 
<211> 11 

<212> DNA 

<213> Artificial Sequence 
25<220> 

<223> A synthetic DMA fragment 
<220> 

<221> misc_feature 
30<222> 1-5 

<223> n = A, T, Q or C 

<400> 6 

nnnnngagac g 11 

35 

<210> 7 
<211> 11 
<212> DNA 

<213> Artificial Sequence 

40 
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4 

<220> 

<223> A synthetic DNA fragment 

<220> I 
5<221> misc_feature 

<222> 4-8 

<223> n = A, T, C, or G 

<400> 7 
lOgccnnnnngg c 

<210> 8 
<211> 14 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

20<220> 

<22l> mis Cofeature 
<222> 6-14 

<223> n = A, T, G, or C 

25<400> 8 

ggatgnnnnn nnnn 

<210> 9 
<211> 11 
30<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

35 

<220> 

<221> mis Cofeature 
<222> 1-5 

<223> n = A, T, G or C 

40 

<400> 9 
nnnimgagac c 



wo 2005/087932 

5 

<210> 10 
<211> 10 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment 
<220> 

10<221> misc_feature 
<222> 6-10 

<223> N = A, T, O, or C 

<400> 10 
ISgacgczmnnn 

<210> 11 
<211> 11 
<212> DNA 
20<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

25<220> 

<221> misc_feature 

<222> 3-9 

<223> n S5 A, T, G, or C 

30<400> 11 
ccnnnnnnng g 

<210> 12 
<211> 11 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 
40' 
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10 



11 
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6 

<220> 

<221> misc__f eature 
<222> 3-9 

<223> n 5s A, T, G, or C 

5 

<400> 12 
gcniinnnnngi c 

<210> 13 
10<211> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

15<223> A synthetic D£IA fragment 
<220> 

<221> misc^f eature 
<222> 1-5 
20<223> n = A, T, C, or G 

<400> 13 
nminngagac 

25<210> 14 

<211> 11 
<212> DNA 

<213> Artificial Sequence 
30<220> 

<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
35<222> 4-8 

<223> n = A, T, G, or C 

<400> 14 
ccazmzmntg g 

40 
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10 
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7 

<210> 15 
<211> 15 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment 

<220> 

10<221> misc^feature 
<222> 6-15 

<223> n = A, T, G, or C 

<400> 15 
ISgtcccnnnnn nnnim 

<210> 16 
<211> 11 
<212> DNA 

20<213> Artificial Sec[uence ^ 
<220> 

<223> A synthetic DNA fragment 

25<220> 

<221> misc_feature 

<222> 1-4 

<223> n = A, T, G, or C 

30<400> 16 
nnnngaagag c 

<210> 17 
<211> 14 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

40 
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8 

<220> 

<221> raisc_feature 
<222> 1-8 

<223> n a A, T, G, or C 

5 

<400> 17 
nimimnnngc aggt 

<210> 18 

10<211> 14 
<212> DNA 

<213> Artificial Sequence 
15<220> 

<223> A synthetic DNA fragment 
<220> 

<223i> misc_feature 
20<222> 1-9 

<223> n = A, T, G, or C 

<400> 18 

25 

<210> 19 
<211> 12 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A synthetic DNA fragment 
<220> 

35<22l> misc_feature 

<222> 4-9 

<223> n = A, T, G, or C 

<400> 19 
40ccannimnnt gg 
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9 

<210> 20 
<211> 13 
<212> DNA 

<213> Artificial Secjuence 

5 

<220> 

<223> A synthetic DNA fragment 
<220> 

10<221> misc_f eature 

<222> 5-9 

<223> n = A, T, G, or C 

<400> 20 
ISSrgccnnnnng gcc 

<210> 21 
<211> 10 
<212> DKA 
20<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

25<220> 

<22l> mis cofeature 

<222> 1-5 

<223> n » A, T, G, or C 

30<400> 21 
nnnnngatcc 

<210> 22 
<211> 22 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

40 



PCT/US2G04/031912 



13 



10 
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10 

<220> 

<221> misc_feature 
<222> 7-22 

<223> n = A, T, G, or C 

5 

<400> 22 

ctggagnnnn imnnnminim nn 22 
<210> 23 

10<211> 10 ' 

<212> DNA 

<213> Artificial Secjuence 
<220> 

15<223> A synthetic DNA fragment 
<220> 

<221> misc^feature 

<222> 4-7 
20<223> n « A, T, G, or C 

<400> 23 

gatnnimatc 10 

25<210> 24 
<211> 4 
<212> PRT 

<213> Artificial Sequence 

30<220> 

<223> A synthetic peptide 

<400> 24 
Thr Cys Thr Ser 
35 1 

<210> 25 
<211> 14 
<212> PRT 
40<213> Artificial Sequence 
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11 

<220> 

<223> A synthetic peptide 
<400> 25 

SThr Cys Cys Ser Ala Asn Asn lie Met Thr Asn Lys Ser Arg 
15 10 

<210> 26 
<211> 12 
10<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

15 

<400> 26 

Thr Cys Ala Ser Thr Asn Asn Phe Leu Ser Tyr Cys 
15 10 

20<210> 27 
<211> 19 
<212> PRT 

<213> Artificial Sequence 

25<220> 

<223> A synthetic peptide 

<400> 27 

Thr Gly Thr Cys Arg Asn Asn He Met Val Thr Ala Asn Lys Asp Glu 
30 1 5 io 15 

Ser Arg Gly 

<210> 28 
35<211> 13 
<212> PRT 

<213> Artificial Sequence 
<220> 

40<223> A synthetic peptide 
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12 

<400> 28 

Thr Asn Asn Phe Leu Ser Tyr Cys Trp Ala Thr Cys He 
15 10 

5<210> 29 
<211> 12 
<212> PRT 

<213> Artificial Sequence 

10<220> 

<223> A synthetic peptide 

<400> 29 

Thr Cys Thr Ser Cys Asn Asn I»eu Pro His Gin Arg 
15 1 5 10 

<210> 30 
<211> 12 
<212> PRT 
20<213> Artificial Sequence 

<220> 

'<223> A synthetic peptide 
25<400> 30 

Thr Gly Thr Cys Cys Asn Asn Leu Pro His Gin Arg 
15 10 

<210> 31 
30<211> 14 
<212> PRT 

<213> Artificial Sequence 
<220> 

35<223> A synthetic peptide 
<400> 31 

Thr Asn Gly Leu Ser Trp Cys Asn Asn Leu Pro His Gin Arg 
15 10 

40 
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13 

<210> 32 

<211> 4 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic peptide 

<400> 32 
lOThr Gly Asn Cys 
1 

<210> 33 
<211> 4 
15<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

20 

<400> 33 
Thr Cys Tyr Ser 
1 

25<210> 34 
<211> 4 
<212> PRT 

<213> Artificial Sequence 

30<220> 

<223> A synthetic peptide 

<400> 34 
Thr Cys Ala Ser 
35 1 

<210> 35 
<211> 12 
<212> PRT 
40<213> Artificial Sequence 
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<220> 

<223> A synthetic peptide 
<400> 35 

5Thr Gly Cys Cys Thr Asn Asn Phe Leu Ser Tyr Cys 
15 10 

<210> 36 
<211> 12 
10<212> PRT 

<213> Artificial Sequence i 
<220> 

<223> A synthetic peptide 

15 

<400> 36 

Thr Gly Cys Cys Cys Asn Asn Leu Pro His Gin Arg 
15 10 

20<210> 37 

<211> 12 ^ ( 

<212> PRT 

<213> Artificial Sequence 

25<220> 

<223> A synthetic peptide 

<400> 37 

Thr Cys Thr Ser Cys Asn Asn Leu Pro His Gin Arg 
30 1 5 10 

<210> 38 
<211> 12 
<212> PRT 
35<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 



40<400> 38 

Thr Ala Thr Tyr Cys Asn Asn Leu Pro His Gin Arg 
15 10 
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<210> 39 
<211> 4 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic peptide 

<400> 39 
lOThr Cys Gly Ser 
1. 

<210> 40 
<211> 10 
15<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DMA fragment 

20 

<220> 

<221> Tnisc_feature 
<222> 4-7 

<223> n = A, T, G, or C 

25 

<400> 40 
caynnnnrtg 

30<210> 41 
<211> 10 

<212> PRT 

<213> Artificial Sequence 

35 

<220> 

<223> A synthetic peptide 



<400> 41 

40Thr Gly Cys Cys Ala Tyr Asn lie Met Thr 
15 10 
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<2X0> 42 
<211> 18 
<212> PRT 

<213> Artificial Sec[uence 

5 

<220> 

<223> A synthetic peptide 
<400> 42 

lOThr Cys Cys Ser Trp Asn Asn lie Met Thr Asn Lys Ser Arg Phe lieu 
15 10 15 

Tyr Cys 



15<210> 43 
<211> 4 
<212> PRT 

<213> Artificial Sequence 

20<220> 

<223> A synthetic peptide 

<400> 43 
Thr Cys Cys Ser 
25 1 

<210> 44 

<211> 14 

<212> PRT 

30<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 
35<400> 44 

Thr Tyr Ala Phe Leu Ser Cys Asn Asn Leu Pro His Gin Arg 
15 10 

<210> 45 
40<211> 17 

<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> A synthetic peptide 
<400> 45 

5Thr Gly Cys Cys Tyr Asn Asn Phe Leu Ser Tyr Cys Leu Pro His Gin 
15 10 15 

Arg 



10<210> 46 
<211> 14 
<212> PRT 

<213> Artificial Sequence 
15<220> 

<223> A synthetic peptide ' 
<400> 46 

Thr Asn Asn Phe Leu Ser Tyr Cys Trp Arg Thr Gly Met Val 
20 1 5 10 

<210> 47 

<211> 14 ' 
<212> PRT 
25<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 
30<400> 47 

Thr Gly Cys Cys Ala Asn Asn He Met Thr Asn Lys Ser Arg 
15 10 

<210> 48 
35<211> 12 
<212> PRT 

<213> Artificial Sequence 
<220> 

40<223> A synthetic peptide 
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<400> 48 

Thr Gly Gly Cys Cys Asn Asn Leu Pro His Gin Arg 
15 10 

5<210> 49 

<211> 15 
<212> PRT 

<213> Artificial Sequence 

10<220> 

<223> A synthetic peptide 

<400> 49 

Thr Asn Cys Phe Ser Tyr Cys Cys Asn Asn Leu Pro His Gin Arg 
15 1 5 10 15 

<210> 50 
<211> 14 
<212> PRT 
20<213> Artificial Sequence 

<220> 

<223.> A synthetic peptide 
25<400> 50 

Thr Cys Gly Ser Ala Asn Asn lie Met Thr Asn Lys Ser Arg 
15 10 . 

<210> 51 
30<211> 12 

<212> PRT 

<213> Artificial Sequence 
<220> 

35<223> A synthetic peptide 
<400> 51 

Thr Cys Lys Ser Gly Asn Asn Val Ala Asp Glu Gly 
15 10 

40 
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<210> 52 
<211> 13 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223;> A synthetic peptide 
<400> 52 

lOThr Asn Asn Phe Leu Ser Tyr Cys Trp Gly Thr Gly Val 
15 10 

<210> 53 
<211> 12 
15<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

20 

<400> 53 

Thr Gly Thr Ser Gly Asn Asn Val Ala Asp Glu Gly 
15 10 

25<210> 54 
<211> 10 

<212> DNA 

<213> Artificial Sequence 
30<220> 

<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
35<222> 4-7 

<223> n = A, T, G, or C 

<400> 54 
gacnnimgtc 

40 
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<210> 55 
<211> 10 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment 
<220> 

10<221> misc^feature 
<222> 4-7 

<223> n = A, T, G or C 

<400> 55 
15gaannzinttc 

<210> 56 
<211> 13 
<212> PRT 
20<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 
25<400> 56 

Thr Asn Asn Phe Leu Ser Tyr Cys Trp Gly Thr Cys Val 
15 10 

<210> 57 
30<211> 12 
<212> PRT 

<213> Artificial Sequence 
<220> 

35<223> A synthetic peptide 
<400> 57 

Thr Cys Thr Ser Gly Asn Asn Val Ala Asp Glu Gly 
15 10 

40 
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<210> 58 

<211> 4 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic peptide 

<400> 58 
lOThr Ala Cys Tyr 
1 

<210> 59 
<211> 13 
15<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

20 

<400> 59. 

Thr Ala Cys Tyr Thr Asn Asn Phe Leu Ser Tyr Cys Trp 
15 10 

25<210> 60 
<211> 12 

<212> PRT 

<213> Artificial Sequence 

30 

<220> 

<223> A synthetic peptide 
<400> 60 

35Thr Gly Gly Cys Gly Asn Asn Val Ala Asp Glu Gly 
1 5 10 

<210> 61 
<211> 14 
40<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> A synthetic peptide 
<400> 61 

5Thr Oly Thr Ser Ala Asn Asn lie Met Thr Asn Lys Ser Arg 
15 10 

<210> 62 
<211> 8 
10<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

15 

<400> 62 ' 
Thr Gly Gly Cys Gly Cys Asn Ala 
1 5 

20<210> 63 

<211> 14 
<212> PRT 

<213> Artificial Secjuence 

25<220> 

<223> A synthetic peptide 

<400> 63 

Thr Ala Thr Tyr Ala Asn Asn lie Met Thr Asn Lys Ser Arg 
30 1 5 10 

<210> 64 
<211> 13 
<212> PRT 
35<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

40<400> 64 < 
Thr Cys Cys Ser Thr Asn Asn Phe Leu Ser Tyr Cys Trp 
15 10 
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<210> 65 
<211> 12 
<212> PRT 
5<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 
10<400> 65 

Thr Thr Ala Leu Cys Asn Asn Leu Pro His Gin Arg 
15 10 

I 

<210> 66 
15<211> 13 
<212> PRT 

<213> Artificial Sequence 
<220> 

20<223> A synthetic peptide 
<400> 66 ^ 

Thr Asn Asn Phe Leu Ser Tyr Cys Trp Thr Thr Cys Phe 
15 10 

25 

<210> 67 
<211> 12 
<212> PRT 

<213> Artificial Sequence 

30 

<220> 

<223> A synthetic peptide 
<400> 67 

35Thr Gly Thr Ser Cys Asn Asn Leu Pro His Gin Arg 
15 10 

<210> 68 ' 
<211> 14 
40<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> A synthetic peptide 
<400> 68 

5Thr Thr Ala Leu Ala Asn Asn lie Met Thr Asn Lys Ser Arg 
15 10 

<210> 69 
<211> 17 
i0<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

15 

<220> 

<221> misc_feature 
<222> 14 

<223> n = A, T, G, or C 

20 

<400> 69 

aaggagcgat cgcnatg 
25<210> 70 

<211> 15 . ' 

<212> DNA 

<213> Artificial Secpience 

30 

<220> 

<223> A synthetic DNA fragment 
<220> 

35<221> raisc_feature 

<222> 1-3 

<223> n = A, T, C, or wherein Ui-Uj, nzU^G, or n^GC is codon which is 
not a stop codon 

40<400> 70 

nnngcgatcg ccatg 
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<210> 71 
<211> 12 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment, wherein the complement to the remainder of 
an open reading frame is present 5' to nnn. 

10<220> 

<22l> misc_feature 

<222> 1-3 

<223> n = A, T, G, or C 
15<400> 71 

nnncatggcg at ^2 

<210> 72 
<211> 12 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

25 

<220> 

<22l> mis cofeature 
<222> 1-3 

<223> n = A, T, G or C, wherein ni-nj is a codon that does not encode 
30 for a stop codon 

<220> 

<221> mis cofeature 
<222> 8-9 

35<223> n = A, T, G, or C, wherein TNgNj is a codon that 
does not code for a stop codon 

<220> 

<221> misc_feature 
40<222> 10-12 

<223> n = A, T, C or G 
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<400> 72 
nnngtttnim nn 

5<210> 73 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
10<220> 

<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
15<222> 6-18 

<223> n = A, T, G or C 

<400> 73 

ggatgnxinxin nminnnnn 

20 

<210> 74 
<211> 18 
<212> DNA 

<213> Artificial Sec[uence 

25 

<220> 

<223> A synthetic DNA fragment 
<220> 

30<221> misc_feature 
<222> 1-13 

<223> n = A, T, G, or c 

<400> 74 
35nnnnnnnnnn nnncatcc 

<210> 75 
<211> 15 
<212> DNA 
40<213> Artificial Sequence 
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18 
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<220> 

<:223> A synthetic DNA fragment 

<220> 
5<221> misc^feature 
<222> 8-15 

<223> n = A, T, G, or C 

<400> 75 
lOcacctgcnnn rmimn 

<210> 76 
<211> 11 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

20<220> 

<221> misc_feature 

<222> 8-11 

<223> n - A, G, or C 

25 

<400> 76 
gctcttcnnn n 

<210> 77 
30<211> 13 
-<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
<222> 5-9 
40<223> n = A, T, G, or 'C 
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<400> 77 
ggccimnnng gcc 

<210> 78 

5<211> 11 
<212> DNA 

<213> Artificial Sec[uence 
<220> 

10<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
<222> 8-11 
15<223> n = A, T, G, or C 

<400> 78 
gctcttcnnn n 

20<210> 79 
<211> 11 
<212> DNA 

<213> Artificial Sequence 

25<220> 

<223> A synthetic DNA fragment 
<220> 

<221> miec_feature 
30<222> 3-9 

<223> n = A, T, G, or C 

<400> 79 
ccnnnnnnng g 

35 

<210> 80 
<211> 13 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> A synthetic DNA fragment 
<220> 

5<221> misc_feature 
<222> 5-9 

<223> n « A, T, G or C 

<400> 80 
lOggccnnnnng gcc 

<210> 81 
<211> 11 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

20<220> 

<22i> mis Cofeature 

<222> 4-8 

<223> n = A, T, G, or C 

25<400> 81 
gcannnnntg c 

<210> 82 
<211> 18 
30<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

35 

<220> 

<221> mis cofeature 
<222> 7-18 

<223> n « A, T, G, or C 

40 

<400> 82 

cccacannnn nnnnnnnn 
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<210> 83 

<211> 19 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment 
<220> 

10<22l> mis cofeature 
<222> 1 

<223> n ss A, T, G, or C 



15<400> 83 

naaggagcga tcgccatgg 

<210> 84 
<211> 18 
20<212> DNA 

<213> Artificial Sequence 



<220> 

25<223> A synthetic DNA fragment 

<220> 

<221> misc_feature 
<222> 1 
30<223> n = A, T, G, or C 

<400> 84 

naaggagcga tcgccatg 

35<210> 85 
<211> 8 

<212> PRT 

<213> Artificial Sequence 

40<220> 

<223> A synthetic peptide 
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<400> 85 

Lys Glu Gin Gly Ala He Ala Met 
1 5 

5<210> 66 
<211> 12 
<212> DNA 

<213> Artificial Sequence 
10<220> 

<223> A synthetic DNA fragment 
<220> 

<221> misc_feature 
15<222> 1-3, 12 

<223> n = A, T, Q, or C 

<400> 86 
nnngtttaaa cn 

20 

<210> 87 

<211> 11 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> A synthetic DNA fragment 

<220> \ 
30<221> misc_feature 
<222> 1-3, 11 
<223> n = A, T, G, or C 

35<400> 87 
nnngtttatc n 

<210> 88 
<211> 11 
40<212> DNA 

<213> Artificial Sequence 



wo 2005/087932 

32 

<220> 

<223> A synthetic DNA fragment 

<220> 
5<221> misc_feature 
<222> 1-3, 11 
<223> n = A, T, G, or C 

<400> 88 
lOnnngtttcca n 

<210> 89 
<211> 19 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

20<220> 

<221> mis Cofeature 
<222> 1 

<223> n = A, T, G, or C 

25<400> 89 

naaggattaa tcgccatgg 

<210> 90 
<211> 8 
30<212> PRT 

<213> Artificial Sequence 

<220> 

<223> A synthetic peptide 

35 

<400> 90 

Lys Glu Gin Gly Leu lie Ala Met 
1 5 

40 
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<210> 91 

<211> 12 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic DNA fragment 
<220> 

10<221> misc^feature 
<222> 1-3, 12 
<223> n = A, T, G or C 

<400> 91 
ISiumgtttaaa tn 

<210> 92 ! \ r '1 ^'^IH t 

<211> 10 
<212> DNA 
20<213> Artificial Sequence 

<220> 

<223> A synthetic DNA fragment 

25<220> 

<221> misc_feature 
<222> 7-10 

<223> n = A, T, G or C 

30<400> 92 
ctcttcnnnn 
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